What will be your main challenges? Responsibilities
- Research and identify public and government data sources.
- Extract, transform, and normalize data from websites, APIs, feeds, FTP sources, and online repositories.
- Design and build reusable, scalable, and maintainable ETL processes and workflows no one-off scripts!.
- Apply advanced web scraping techniques using Python, HTTP requests, and HTML parsing.
- Ensure quality: Identify inconsistencies, validate data samples, and document methodologies and processes.
- Collaborate and version control: Maintain repositories using Git under development best practices and maintain clear communication with stakeholders.
What are we looking for? Requirements
- Solid experience in web scraping, data scraping, and structured/unstructured data extraction.
- Technical proficiency: Hands-on experience programming in Python or similar languages, knowledge of APIs, HTTP, FTP, HTML parsing, and relational databases like PostgreSQL.
- Language: Advanced English level fluent written and technical communication.
- Analytical mindset: Ability to solve complex data acquisition problems, optimize solutions, and work independently while taking on technical challenges.
- Quality focus: Strong emphasis on data validation, normalization, and documentation.