Databricks Consultant

Datavail

Mumbai, Maharashtra, India Full time

Apply on EasyApply

Create a free account to apply in seconds

Job Title: Senior Associate Developer - Databricks, PySpark, and Spark SQL

Education: Any Graduate

Experience: 5+years

Location: Mumbai

Key Skills:

• Strong hands-on experience with Databricks, PySpark, and Spark SQL.

• Expertise in Delta Lake, Bronze–Silver–Gold architecture, and Lakehouse patterns.

• Strong experience with cloud platforms (AWS/Azure/GCP).

• Solid understanding of data warehousing, dimensional modeling, and big‑data concepts.

Job Description:

• Build scalable ETL/ELT pipelines using Databricks (PySpark, SQL, Spark Streaming).

• Develop and optimize Delta Lake tables, ACID transactions, schema evolution, and time travel.

• Implement Unity Catalog, data governance, and access control.Optimize cluster configurations, job workflows, and performance tuning in Databricks.

• Design and implement batch and streaming pipelines using Spark Structured Streaming.

• Integrate Databricks with multiple data sources (RDBMS, APIs, cloud storage, message queues).Develop reusable, modular, and automated data processing frameworks.

• Implement CI/CD pipelines for Databricks using GitHub Actions / Azure DevOps / GitLab.Automate cluster management and job orchestration using Databricks REST APIs.

• Maintain code quality, unit tests, and documentation.

• Write and optimize complex SQL queries and statements to ensure high performance and efficient data retrieval.

• Strong database design including normalization, data modelling, and relational schema creation.

• Conduct performance analysis, troubleshoot database issues like slow queries or deadlocks and implement solutions

• Design and implement database structures, including tables, schemas, views, stored procedures, functions, and triggers.

• Optimize database performance through query tuning, indexing, and performance analysis.

• Ensure data integrity, security, and compliance standards

• Need strong Python skills combined with expertise in Apache Spark for large scale data processing. Core abilities include building efficient ETL pipelines, optimizing distributed jobs, and handling large-scale data transformations

• Expertise in Python programming, Spark APIs, and parallel processing.

• Proficiency in Python (including Pandas, NumPy) for data manipulation and scripting

• Deep knowledge of PySpark APIs like DataFrames, RDDs, Spark SQL for querying and processing.

• Familiarity with RESTful APIs, batch processing, CI/CD, and monitoring data jobs.

• Optimize Spark jobs for performance, troubleshoot issues, and ensure data quality across systems.

• Collaborate with data engineers and scientists to implement workflows, conduct code reviews, and integrate with cloud platforms like AWS or Azure.

• Design, develop, and maintain scalable data pipelines and ETL processes using Azure Databricks

• Build data transformation workflows using Python or Scala.

• Work with data lakes using Delta Lake.

• Integrate data from multiple sources such as APIs, databases, and cloud storage.

• Monitor and optimize data workflows for performance and reliability.

• Collaborate with data scientists, analysts, and business teams.

Skills

DatabricksPySparkSpark SQLDelta LakeCloud Platforms (AWS/Azure/GCP)Data WarehousingETL/ELT PipelinesPython ProgrammingCollaborationPerformance Tuning