Platform Infrastructure Engineer
Highly valued qualifications & experiences
- Experience with new technology introduction @ zero downtime including data migration
- Fan of automated test and qualification.
- Available to work (remotely) outside regular office hours when it proves that attempt to build a fail-safe system was not yet successful. We really want this to be an exception, not a rule
Required qualifications & experiences
- Knowledge of distributed computing systems, practical experience (must!)
- Ansible playbooks and programming
- DC/OS and Kubernetes, configuration, not just development on top off.
- HBASE and Hadoop as a very nice to have, config, troubleshoot, failover, replication
- Spark, Kafka topics, Zookeeper for management
- Experienced in CI/CD, including git, test automation, etc
- Familiar with at least one scripting language (e.g. Python)
- Solid experience with Ansible and Kubernetes, see below.
- You like to solve problems (permanently)
- You are open to Challenges
- You think outside the box
- You can look through the customer eyes
- You automate everything
- You have a positive attitude
We also value
- Collaboration with stake holders
- Curiosity, understand how the system is working
- Ability to dive deep into a specific topic
- Keeping in mind we are not Netflix, we tend to choose more proven technology as latest greatest in order to keep meeting the 4 nine’s.
- Being able to combine the individual elements and requests into a system design
- Having fun
You will be working as engineer in the virtual compute platform (VCP). This platform is developed inside our client to host compute and analytics applications that aim to improve the yield in the semiconductor factories of our customers.
These applications take data from our client's scanners and yield star equipment. They combine this data to real time production corrections and scanner process diagnostics. The corrections are sent back to the production equipment. Failure of the platform has high impact. It would mean failure of the customers (tsmc, Samsung, Intel etc.) production facility.
The platform is currently developed based on DC/OS. In 2021 we have started the migration to Kubernetes. We develop the platform aspects in our team. Scheduling of resources, containerization, fail-over and data collection from scanner and measurement devices inside the fab. We have an uptime expectation of 4 nine’s. As a true distributed computing expert you will have your own view on such a baseline expectation. This might be a nice topic to discuss during an interview.
Installations and upgrades are automated with Ansible. Other technologies you may encounter are Spark for data processing, Kafka for notifications and high volume data ingestion. Hadoop and HBASE are used for data storage. We are open to your underpinned input on the suitability of stable alternatives for these technologies where these better suit the business case.
- Design and implement the product with the team
- Automated tests accompany every delivery
- Help application developers to understand the infrastructure / cluster / system
- Make the VCP reliable by improving system resilience (bug-fixing and beyond)
Participate in the development of our distributed data and compute platform infrastructure. Be accurate, be precise and own the specification, design and implementation of features and fixes. Onboard, integrate and configure open source or other packages that support the development of semiconductor process tuning applications on the platform. Support installation of these platforms in Korea, Taiwan, Israel, China and the US (etc.).
Be part of this compute platform that is one of the main pillars under the production of the next generation microchips of Apple, Samsung and many others.
Context of the position
You will be working at Business Line Applications. The BL Apps develops Analytics & Control solutions that improve the accuracy of performance metrics (such as overlay, focus, critical dimension) as measured on the end product of a fab process (wafers with chip structures). You will work on the platform underneath these processing algorithms, a distributed computing platform. This platform will provide value to our clients customers all over the world, making sure the chips of the next generation are produces efficiently and with the highest quality.
There are 3 – 4 infra teams, 20-30 engineers, Product Owners and Scrum Masters working on the platform layers.
The application development teams that develop the business critical applications consist of 15-25 teams.
Keywords: Ansible, DC/OS, D2IQ, Mesosphere, HDFS, MongoDB, Docker, UCR, Spring Boot, Splunk, Linux, HDP, Nexus, JIRA, Scrum, RHEV, RHEL, Commvault, Key-cloak
Please send us your recent CV + a cover letter for this role (both in English) together with your availability/planned vacations and all-in hourly rate VAT (BTW) excluded.