Description:
Are you passionate about leveraging data to drive innovative AI and machine learning solutions? Do you have a keen eye for detail and a knack for solving complex problems? Join our dynamic team at MELIOR ITS Private Limited as a Data Analyst. Your primary responsibilities will revolve around data preprocessing, harmonization, and building efficient data pipelines to support continuous learning and model updates.
Key Responsibilities:
- Data Cleaning: Focuses on preprocessing data for model training, including handling missing values, outliers, and ensuring data is in the right format.
- Harmonization Engine: Ensures data consistency and compatibility for model training.
- Data Pipeline: Builds pipelines that support continuous learning and model updates.
- Tool Usage: Uses ML-specific tools and frameworks such as TensorFlow, PyTorch, and Databricks for large-scale data processing.
- Technical Skills: Involves more complex data preprocessing, feature engineering, and ensuring data is split properly for training, validation, and testing of ML models.
- Visualization Knowledge: Basic understanding of visualization tools to interpret model performance and results.
- Pattern Recognition: Utilizes pattern recognition for feature extraction and improving model accuracy.
Technical Skills:
- Proficiency in Python programming language, including extensive experience with NumPy and pandas for data manipulation and analysis.
- Strong knowledge of SQL for database operations, including data querying and manipulation.
- Experience with Apache Spark for distributed computing, data processing, and analysis.
- Familiarity with Docker for containerization of applications and microservices.
- Proficient in using Git for version control, including branching, merging, and code collaboration.
- Hands-on experience with Jupyter Notebooks for prototyping, experimentation, and data visualization. Familiarity with cloud platforms such as AWS, GCP, or Azure for deployment and scalability of machine learning applications.
- Knowledge of Kafka for real-time data streaming and event-driven architectures.
Preferred Qualifications:
- Bachelors/Master's degree in Computer Science, Engineering, Mathematics, or related field.
- Minimum of 3 years of experience in software development, with a focus on back-end systems and machine learning.
- Proven track record of implementing machine learning models and algorithms using TensorFlow and/or PyTorch.
- Proficiency in programming languages such as Python, Java, or Scala
- Experience with Databricks for large-scale data processing and analytics.
- Knowledge of container orchestration platforms like Kubernetes.
- Experience with automated testing frameworks for machine learning models.
- Understanding of software engineering best practices, including code optimization, performance tuning, and scalability considerations.