As a data engineer, your focus is wrangling messy, unstructured data into structured datasets that will be the foundation of our next generation of machine learning algorithms. You will work collaboratively with our full-stack developers, machine learning scientists, and the rest of our multidisciplinary team, to deliver data via our distributed systems and/or APIs.
In this role, you will be responsible for understanding how data must be structured, and creating the scripts to extract, load and transform it. You will also be responsible for extending and optimizing our data pipeline architecture, as well as optimizing the data flow. As such, you take ownership and pride in your work: from the first line of code to the writing of tests and monitoring solutions.
Within Katana we mainly use the following technologies and frameworks:
- Apache Beam
- GitLab CI
- Google Cloud Platform (GKE, Dataflow, Pub/Sub, BigQuery)
- Kubernetes / Docker
Do you recognize yourself in this profile?
- Bachelor’s Degree in Software Engineering / AI / Computer Science;
- Strong programming skills in Python with Apache Beam
- Background and interest in statistical analysis
- Proven proficiency in cloud technologies. Experience with Google Cloud Platform is a strong plus;
- A DevOps mindset. You take ownership from the first line of code to monitoring in production;
- You have a learning attitude. Not only to master new technologies and programming languages, but also on the interpersonal level. You are proven to be able to ask and give feedback;
- You feel at home in a high-performing team and you make the other team members feel at home as well. You have the independence to speak up when needed;
- Solid understanding GIT, CI/CD;
- Strong problem-solving skills;
- Fluent in written and spoken English;
Nice to have:
- Experience with Machine Learning pipelines (preferred experience with real implementations), or at least strong affinity with it;
- Knowledge of Machine learning Experience with container-based development and Kubernetes;
- Good understanding of big data technologies like Hadoop, Spark;
- Good understanding of streaming technologies like Google Pub/Sub and Apache Beam;
- Good understanding of databases, both RDBMS (PostgreSQL) and non-SQL databases (e.g., BoltDB);
- Good understanding of engineering tools like Docker and Kubernetes;
- Experience in building 0-downtime applications/components;
- Good understanding of automated testing frameworks;
What do we offer?
Working at Katana means working in a dynamic and international setting. The individual development of our employees is very important and that is why we offer excellent courses and programs. If you become a member of our team, you will join a prestigious, no-nonsense, and high-output data-science & engineering team. Diva behavior is not tolerated, neither is underperformance. We only hire people with exceptional talents and capabilities! You will work on the most innovative projects within Katana. In addition, we offer:
- A remote position
- Travel to meet team members every 4-6 weeks.
- Working with highly skilled people
- Working in a startup environment
- A relaxed and energetic team
- An International atmosphere
- A full-time position (40 hour week)
- Great training and education opportunities
- Developing a cloud-native application