Voxxed Days Melbourne 2019
from Monday 13 May to Tuesday 14 May 2019.
Derrick Cheng is a data engineer at Zendesk, working in a team that builds machine learning products. He has experience being a full stack engineer building modern web applications. Also, he is passionate about data processing pipelines and cloud technologies.
See also https://medium.com/@chengderrick
- I will be co-presenting this talk with my colleague Derrick Cheng (Zendesk) *
This talk covers how we scaled our model building infrastructure at Zendesk with an aim to build at least 50,000 models a day. This is achieved as part of our efforts to deliver a machine learning (ML) product called Content Cues.
Content Cues summarises text from customers support tickets to form insightful topics. It combines multiple ML algorithms including deep learning, clustering and other natural language processing approaches. These ML algorithms are then run through tens of thousands of eligible Zendesk customer data every day.
We will talk about: - how we implement a horizontally scalable model building pipeline by combining AWS EMR, AWS Batch and Kubernetes - how to balance between model performance, scalability and computational efficiency - some real-world problems and scaling complexities you may encounter when building ML product at web scale