Databricks Consultant
Create a free account to apply in seconds
Job Title: Senior Associate Developer - Databricks, PySpark, and Spark SQL
Education: Any Graduate
Experience: 5+years
Location: Mumbai
Key Skills:
• Strong hands-on experience with Databricks, PySpark, and Spark SQL.
• Expertise in Delta Lake, Bronze–Silver–Gold architecture, and Lakehouse patterns.
• Strong experience with cloud platforms (AWS/Azure/GCP).
• Solid understanding of data warehousing, dimensional modeling, and big‑data concepts.
Job Description:
• Build scalable ETL/ELT pipelines using Databricks (PySpark, SQL, Spark Streaming).
• Develop and optimize Delta Lake tables, ACID transactions, schema evolution, and time travel.
• Implement Unity Catalog, data governance, and access control.Optimize cluster configurations, job workflows, and performance tuning in Databricks.
• Design and implement batch and streaming pipelines using Spark Structured Streaming.
• Integrate Databricks with multiple data sources (RDBMS, APIs, cloud storage, message queues).Develop reusable, modular, and automated data processing frameworks.
• Implement CI/CD pipelines for Databricks using GitHub Actions / Azure DevOps / GitLab.Automate cluster management and job orchestration using Databricks REST APIs.
• Maintain code quality, unit tests, and documentation.
• Write and optimize complex SQL queries and statements to ensure high performance and efficient data retrieval.
• Strong database design including normalization, data modelling, and relational schema creation.
• Conduct performance analysis, troubleshoot database issues like slow queries or deadlocks and implement solutions
• Design and implement database structures, including tables, schemas, views, stored procedures, functions, and triggers.
• Optimize database performance through query tuning, indexing, and performance analysis.
• Ensure data integrity, security, and compliance standards
• Need strong Python skills combined with expertise in Apache Spark for large scale data processing. Core abilities include building efficient ETL pipelines, optimizing distributed jobs, and handling large-scale data transformations
• Expertise in Python programming, Spark APIs, and parallel processing.
• Proficiency in Python (including Pandas, NumPy) for data manipulation and scripting
• Deep knowledge of PySpark APIs like DataFrames, RDDs, Spark SQL for querying and processing.
• Familiarity with RESTful APIs, batch processing, CI/CD, and monitoring data jobs.
• Optimize Spark jobs for performance, troubleshoot issues, and ensure data quality across systems.
• Collaborate with data engineers and scientists to implement workflows, conduct code reviews, and integrate with cloud platforms like AWS or Azure.
• Design, develop, and maintain scalable data pipelines and ETL processes using Azure Databricks
• Build data transformation workflows using Python or Scala.
• Work with data lakes using Delta Lake.
• Integrate data from multiple sources such as APIs, databases, and cloud storage.
• Monitor and optimize data workflows for performance and reliability.
• Collaborate with data scientists, analysts, and business teams.