Data Scientist (environmental sustainability reporting solutions) – 100% remote
Lokalita: Česká republika, Slovenská republika
Typ: plný úväzok
Sektor: IT a Telekomunikácie
Úroveň pozície: Specialist
Plat: from 3000€ gross/monthly (or based on your experience and expectations)
+421 907 459 976
Are you passionate about using data to drive change? Would you like to work on solutions that help over 10000 organizations to set and achieve environmental sustainability?
If yes, we are looking for a Data Scientist who will be responsible for leading and delivering the goals of the company's new data platform. As a Data Scientist, you will work on designing and delivering script migration, helping to build up the azure data platform capabilities, and will work closely with the Data & Insights Department to ensure cross-organization consistency. Among mentioned you will analyze, test, aggregate, and optimize the data and present it for the particular clients.
Key responsibilities include:
- Delivery of script migration from various on-prem version control systems and sources
- Re-development and enhancements of scripts to take advantage of the new data platform - Data Brick's capabilities and Azure stack
- Delivery of automated data cleaning and structuring algorithms
- Assistance in 3rd party provisioning and preparation of data
- Strong skills in translation of requirements to code
- Collaboration with cloud team on configuration and best practices of data cloud platform
- SQL code translation to python
- Pre and post-processing script creation to fit current codebases with no or minimal alterations
- Performance tracking scripting and dashboarding
- Creation of power bi insight dashboards and excellent visualization skills
- Productionising analysis pipelines through a cloud toolset and hosting static and dynamic presentations of the generated insight
- 5 days of additional holiday
- 5 paid sick days
- To participate in interesting projects
- To develop professional knowledge and skills
- To become a part of an international team
- Hybrid work mode (flexibility to choose working on-site or remotely)
- Other benefits will be specified during an interviewing process
This role requires experience in building data pipelines, deploying models using azure stack, automation of pipelines as well as sourcing and preparation of data working together with the data engineering team. The successful candidate will need to demonstrate a capability to work and communicate effectively with others, including stakeholders and thematic teams, to ensure processes are followed, deliverables are aligned to milestones, and outputs are built to agreed quality standards.
Key skills and experience:
- Minimum 3 years of experience using an open-source programming language for large scale analysis (Python and R, Spark, PySpark) and relational databases (MongoDB, Parquet, Hive) and using SQL to query databases
- A strong mathematical and statistical background with a deep understanding of statistical inference, experimental design, sampling, and simulation
- Strong experience in the training and production of machine learning models using both structured and unstructured data in big data pipelines, in Azure
- Experience with well-known code libraries for data preprocessing (pandas, dplyr, tidyr, , scipy, feature-engine, beautiful soup, scrapy, spacy, nltk, TextBlob, fastText, polyglot, requests, json, functools).
- Good technical communication & presentation skills in English.
- Be able to work in a matrix environment within a virtual team.
- Strong Project experience with NLP, text analytics, and other relevant areas (e.g. text classification, topic detection, information extraction, Named Entity recognition, entity resolution, Question-Answering, sentiment analysis, event detection, language modeling).
- Experience with managing and deploying models using Azure Data Bricks, Azure Data Lake, Azure Data Factory
- Excellent data visualization skills using Power BI or similar tools.
- Experience with version control and shell scripting
An international organization that drives companies and governments to reduce their greenhouse gas emissions, safeguard water resources and protect forests through the platform, that is one of the richest sources of how the companies and governments are driving environmental changes globally.