Description:
What you will do
- Work in teams or in an individual capacity to perform ETL/ELT (extract, transform and load) of data from a variety of databases, SQL, NoSQL, Hadoop, Neo4j, etc.
- Strong experience with large-scale and or distributed processing methodologies such as Storm, Spark and many others.
- Build data pipelines using cloud and traditional ETL tools.
- Review data quality and definitions, and perform data cleansing and data management tasks.
- Work with the engagement team to translate business and analytics requirements into a data strategy for the engagement including ETL, data model and staging data for analysis
- Architect the data platform for scalability, repeatability and performance.
- Develop standards for data processes and automate routine tasks and Data Governance around them.
- Run SQL queries for descriptive analytics and provide formatted result sets.
- Proactively contribute to the creation of presentation materials relating to data activities for stakeholder discussions.
- Support application testing and production implementation as required.
- Bridge the gap between technical platform needs and business issues.
- Design software architecture following best practices after understanding customer requirements in enterprise environments.
What you bring to the role
- Strong experience working on ETL/ELT (extract, transform and load) of data from a variety of databases from SQL, NoSQL, Hadoop, Neo4j, MPP databases etc.
- Independent ability to review the data quality and data definitions, and perform data cleansing and data management tasks
- Experience working in a multi-disciplinary team to tack unstructured data processing problems across a diverse range of industries
- Strong ability to use Python, Pandas and NumPy for data analysis and cleansing tasks
- Experience with visualization tools such as Tableau, PowerBI, QlikView, etc.
- Strong understanding of Data Warehousing and Data Modeling.
- Experience or knowledge in one major cloud service: AWS, MS Azure and GCP.