Elements of Machine Learning Engineering

@ Georgia Tech, Coming Soon

Time: Friday; 7:00 PM to 8:00 PM
Location: Zoom
Office Hours: Friday; 2:00 PM to 3:00 PM (Optional)

Note: Syllabus and seminar schedule are subject to change. Any changes to the syllabus and / or seminar schedule after the semester begins will be communicated to students on Canvas or Ed. Please follow these communication channels closely throughout the semester.

Learning Outcomes

You’ve learned all about various machine learning algorithms and their theoretical underpinnings during your time in the OMSCS program. It’s now time to apply for industry jobs and take your knowledge from toy problem to production.

The purpose of this seminar is to bridge the gap between the various courses of the OMSCS Machine Learning Specialization and deploying underlying techniques in the wild. After completion of this seminar, students should be comfortable navigating technical interviews, participating in the machine learning engineering process, architecting data pipelines, and maintaining production models.

In the same way that students first encounter programming through introductory programming courses and productionize that knowledge by taking a software engineering course later in a typical Computer Science curriculum, this seminar serves as analogous glue for the ML curriculum.

We’ll focus closely on case studies and examples to contextualize concepts as we go. The first half of the seminar series will define and introduce the core components of the MLE discipline. The second half will focus on taking algorithms you’re already familiar with and recontextualizing their use from a MLE perspective.

Recommended Courses

There’s not a strict requirement to take any courses before this seminar since it will be fairly self-contained; however, ideally you will have taken at least CS 7641 so that you’re somewhat familiar with some of the algorithmic concepts beforehand.

Recommended Text

The recommended text for this seminar is optional. Although it’s tangentially related to machine learning itself, I find it’s extremely helpful for wrapping your head around designing distributed (data) systems which is a primary consideration at every step of the MLE process.

Designing Data-Intensive Applications (DDIA); Kleppman, 2017; O’Reilly Media.

Schedule

  • Week 1: Prototyping Models (Slides)
    • DDIA Chapter 1 (Reliable, Scalable, and Maintainable Applications)
    • Intake Survey (Optional)
  • Week 2: The MLE Lifecycle (Slides)
    • DDIA Chapter 3 (Storage and Retrieval)
  • Week 3: Designing Data Pipelines (Slides)
    • DDIA Chapter 5 (Replication)
  • Week 4: Model Deployment and Maintenance (Slides)
    • DDIA Chapter 6 (Partitioning)
  • Week 5: All About Data (Slides)
    • DDIA Chapter 7 (Transactions)
    • Project Released
  • Week 6: Regression (Slides)
    • DDIA Chapter 8 (The Trouble with Distributed Systems)
  • Week 7: Tree-Like Algorithms, Ensembles (Slides)
    • DDIA Chapter 9 (Consistency and Consensus)
  • Week 8: Support Vector Machines (Slides)
    • DDIA Chapter 10 (Batch Processing)
  • Week 9: Neural Networks, Deep Learning (Slides)
    • DDIA Chapter 11 (Stream Processing)
  • Week 10: Scheduled Presentations
    • Exit Survey (Optional)

Project and Grading

There will be a single project towards the end of the semester that consists of two deliverables: a written 10-page report and corresponding presentation on that report. Presentation of your own material is required (hopefully this is not really a surprise).

The purpose of this project is to leverage the knowledge learned earlier in the seminar by creating a system design report about the domain and corresponding problem of your choosing. You’ll go about taking your chosen problem from an initial objective to a full-fledged system.

You may optionally attend other student’s presentations as you find them interesting and useful. Students will have a chance to pick a slot among many in Week 10 to do their final presentation. The finalized schedule will be made available to all registered students.

Grading for the course will be pass / fail and based on these two deliverables. A rubric will be provided to students before Week 10 including grading criteria for the project and pass / fail criteria.

Regrading Policy

If you are convinced that your score is in error in light of the feedback you received, you may request a regrade within a week of the score and feedback being returned to you. A regrade request is only valid if it includes an explanation of where I made an error.

More particulars on submitting these requests will be available on Ed as the seminar progresses.

Students with Disabilities

If you have any accommodations, please inform me as soon as possible, and provide me with the detailed accommodation approval letter from the GT Office of Disability Services.

Academic Integrity Policy

Plagiarizing is defined by Webster’s as “to steal and pass off (the ideas or words of another) as one’s own : use (another’s production) without crediting the source.” If caught plagiarizing, you will be dealt with according to the GT Academic Honor Code.

When working on your project, you may not work with other students, and doing such is a violation of the GT Academic Honor Code. Submitting any work other than your own is also a violation of the Academic Honor Code.

For any questions involving these or any other Academic Honor Code issues, please consult me or this website.