Serving PyTorch Models in Production

pytorch deployment

Serving PyTorch Models in Production

Data Umbrella 20 Nov 2022

Summary posted by: Sangam SwadiK

Event

This talk is for a data scientist or ML engineer looking to serve their PyTorch models in production. It will cover post training steps that should be taken to optimize the model such as quantization and TorchScript. It will also walk the user in packaging and serving the model through Facebook’s TorchServe.

Video

Resources

Repo: GitHub Repository

Section Timestamps of Video

00:00:00 About session
00:00:47 About Data Umbrella
00:04:18 Introduction
00:05:16 Session agenda
00:06:01 Machine learning at Walmart
00:12:11 Review of some deep learning concepts
00:15:24 BERT: Different architectures
00:16:07 Bi-LSTM vs BERT
00:21:59 Model inference
00:24:21 Load the model
00:25:21 Test prediction
00:28:01 Inference review(inference time vs accuracy tradeoff)
00:29:17 BERT large
00:30:03 Distilled-BERT
00:33:54 Optimizing model for production
00:34:03 Post training optimization: Quantization
00:35:50 Types of Quantization
00:37:35 Quantization results
00:38:23 Post training optimization: Distillation
00:39:44 Distillation results
00:40:35 Eager execution vs Script mode
00:42:02 TorchScript JIT: Tracing vs Scripting
00:43:11 TorchScript Timing
00:45:21 Optimizing the model(Hands On)
00:47:36 Quantization(Hands On)
00:52:00 TorchScript(Hands On)
00:56:33 Deploying the model
00:57:13 Options for deploying Pytorch model
00:57:42 Benefits of TorchServe
00:59:41 Packaging a model/MAR
01:00:00 Pytorch BaseHandler
01:03:00 Built in handlers
01:04:15 Serving
01:05:10 APIs
01:05:32 Deploying the Model(Hands On)
01:22:11 Lessons Learned
01:23:50 Q/A

About the Speakers

Bio

Nidhin Pattaniyil is a Machine Learning Engineer in Walmart Search.

Connect with the Speaker

Nidhin’s LinkedIn: Nidhin Pattaniyil
Nidhin’s GitHub: @npatta01

Key Links

You may also like

15 Sep 2021

COVID-19 R Dashboard in Production

COVID-19 R Dashboard in Production

Rami Krispin - This talk focuses on a holistic approach for deploying data science projects into production (e.g., CI/CD) with the use of open-source and free tools such as R, bash, Docker, Github Actions, etc.