Summary posted by: Sangam SwadiK
Event
This talk is for a data scientist or ML engineer looking to serve their PyTorch models in production. It will cover post training steps that should be taken to optimize the model such as quantization and TorchScript. It will also walk the user in packaging and serving the model through Facebook’s TorchServe.
Video
Resources
- Repo: GitHub Repository
Section Timestamps of Video
- 00:00:00 About session
- 00:00:47 About Data Umbrella
- 00:04:18 Introduction
- 00:05:16 Session agenda
- 00:06:01 Machine learning at Walmart
- 00:12:11 Review of some deep learning concepts
- 00:15:24 BERT: Different architectures
- 00:16:07 Bi-LSTM vs BERT
- 00:21:59 Model inference
- 00:24:21 Load the model
- 00:25:21 Test prediction
- 00:28:01 Inference review(inference time vs accuracy tradeoff)
- 00:29:17 BERT large
- 00:30:03 Distilled-BERT
- 00:33:54 Optimizing model for production
- 00:34:03 Post training optimization: Quantization
- 00:35:50 Types of Quantization
- 00:37:35 Quantization results
- 00:38:23 Post training optimization: Distillation
- 00:39:44 Distillation results
- 00:40:35 Eager execution vs Script mode
- 00:42:02 TorchScript JIT: Tracing vs Scripting
- 00:43:11 TorchScript Timing
- 00:45:21 Optimizing the model(Hands On)
- 00:47:36 Quantization(Hands On)
- 00:52:00 TorchScript(Hands On)
- 00:56:33 Deploying the model
- 00:57:13 Options for deploying Pytorch model
- 00:57:42 Benefits of TorchServe
- 00:59:41 Packaging a model/MAR
- 01:00:00 Pytorch BaseHandler
- 01:03:00 Built in handlers
- 01:04:15 Serving
- 01:05:10 APIs
- 01:05:32 Deploying the Model(Hands On)
- 01:22:11 Lessons Learned
- 01:23:50 Q/A
About the Speakers
Bio
Nidhin Pattaniyil is a Machine Learning Engineer in Walmart Search.
Connect with the Speaker
- Nidhin’s LinkedIn: Nidhin Pattaniyil
- Nidhin’s GitHub: @npatta01