A load balancer that learns, WebTorch

In my previous blog post “How I stopped worrying and embraced docker microservices” I talked about why Microservices are the bees knees for scaling Machine Learning in production. A fair amount of time has passed (almost a year ago, whoa) and it proved that building Deep Learning pipelines in production is a more complex, multi-aspect problem. Yes, microservices are an amazing tool, both for software reuse, distributed systems design, quick failure and recovery, yada yada. But what seems very obvious now, is that Machine Learning services are very stateful, and statefulness is a problem for horizontal scaling.

Context switching latency

An easy way to deal with this issue is understand that ML models are large, and thus should not be context switched. If a model is started on instance A, you should try to keep it on instance A as long as possible. Nginx Plus comes with support for sticky sessions, which means that requests can always be load balanced on the same upstream a super useful feature. That was 30% of the message of my Nginxconf 2017 talk.

The other 70% of my message was urging people to move AWAY from microservices for Machine Learning. In an extreme example, we announced WebTorch, a full-on Deep Learning stack on top of an HTTP server, running as a single program. For your reference, a Deep Learning stack looks like this.

Pipeline required for Deep Learning in production.
What is this data, why is it so dirty, alright now it’s clean but my Neural net still doesn’t get it, finally it gets it!

Now consider the two extremes in implementing this pipeline;

  1. Every stage is a microservice.
  2. The whole thing is one service.

Both seem equally terrible for different reasons and here I will explain why designing an ML pipeline is a zero-sum problem.

Communication latency

If every stage of the pipeline is a microservice this introduces a huge communication overhead between microservices. This is because very large dataframes which need to be passed between services also need to be

  1. Serialized
  2. Compressed (+ Encrypted)
  3. Queued
  4. Transfered
  5. Dequeued
  6. Decompressed (+ Decrypted)
  7. Deserialized

What a pain, what a terrible thing to spend cycles on. All of these actions need to be repeated every time the microservice limit is crossed. The horror, the terrible end-to-end performance horror!

In the opposite case, you’re writing a monolith which is hard to maintain, probably you’re either using uncomfortable semantics either for writing the HTTP server or the ML part, can’t monitor the in between stages etc. Like I said, writing a ML pipeline for production is a zero-sum problem.

An extreme example; All-in-one deep learning

Venn diagram of torch, nginx
Torch and Nginx have one thing in common, the amazing LuaJIT

That’s right, you’ll need to look at your use case and decide where you draw the line. Where does the HTTP server stop and where does the ML back-end start. If only there was a tool that made this decision easy and allowed you to even go to the extreme case of writing a monolith, without sacrificing either HTTP performance (and pretty HTTP server semantics) or ML performance and relevance in the rapid growing Deep Learning market. Now such a tool is here (in alpha) and it’s called WebTorch.

WebTorch is the freak child of the fastest, most stable HTTP server, nginx and the fastest, most relevant Deep Learning framework Torch.

Now of course that doesn’t mean WebTorch is either the best performance HTTP server and/or the best performing Deep Learning framework, but it’s at least worth a look right? So I run some benchmarks, loaded the XOR neural network found at the torch training page. I used another popular Lua tool, wrk to benchmark my server. I’m sending serialized Torch 2D DoubleTensor tensors to my server using POST requests to train. Here’s the results:

Huzha! Over 1000 req/sec on my Macbook air, with no Cuda support and 2 Intel cores!

So there, plug that into a CUDA machine and see how much performance you squeeze out of that bad baby. I hope I have convinced you that sometimes, mixing two great things CAN lead to something great and that WebTorch is an ambitious and interesting open source project!

And hopefully, in due time it will become a fast, production level server which makes it easy for Data Scientists to deploy their models in the cloud (do people still say cloud?) and devOps people to deploy and scale.

Possible applications of such a tool include, but not limited to:

  • Classification of streaming data
  • Adaptive load balancing
  • DDoS attack/intrusion detection
  • Detect and adapt to upstream failures
  • Train and serve NNs
  • Use cuDNN, cuNN and cuTorch inside NGINX
  • Write GPGPU code on NGINX
  • Machine learning NGINX plugins
  • Easily serve GPGPU code
  • Rapid prototyping Deep Learning solutions

Maybe your own?

Announcing the UnifyID Spring AI Fellowship

Today, we would like to announce the UnifyID AI Fellowship program for Spring 2017. This is the second edition of the fellowship (Fall 2016 cohort) and is expected to run for 12 weeks, February 23 through May 18. This selective, cross-disciplinary program covers the following areas:

  • Deep Learning
  • Signal Processing
  • Optimization Theory
  • Sensor Technology
  • Mobile Development
  • Statistical Machine Learning
  • Security and Identity
  • Human Behavior
  • UX/UI Development for the above areas
  • Tech Journalism for the above areas
  • Special Focus:

We will be assigning one fellow to work on fakenewschallenge.org in collaboration with Dr. Dean Pomerleau of the Carnegie Mellon University Robotics Institute. If interested, please add a note in your application. We expect this fellowship applicant to have substantial experience with handling textual data and NLP expertise. The application should reflect links to previous work in this domain.

 

FELLOWSHIP DETAILS

Our UnifyID AI Fellows will be initially allocated to a well-defined project matched with their area of interest and expertise and also mapped to a fellowship mentor. The fellows are then presented with a week’s time to collaborate with the mentor and come up with an 11-week timeline roughly detailing the pathway that they plan to take to achieve the project end-goals.

During the fellowship, the fellows are expected to convene in-person and present weekly updates every Thursday evening in our office located in SoMa, San Francisco. In exceptional cases, individuals will be allowed to present via video chat. Absentees in these update-presentation sessions for two consecutive weeks will result in an automatic ejection from the fellowship.

All selected fellows will be awarded:

  1. Life-long designation as a UnifyID AI Fellow.
  2. A fellowship stipend.
  3. Access to state-of-the-art GPU hardware and $360,000 in Microsoft Azure cloud service credits.
  4. Access to our office space in SoMa.
  5. Prepaid Clipper card to help with commuting to/from the office.
  6. A chance to collaborate and publish with top-tier security experts from MIT, Stanford, CMU, Berkeley, Dartmouth, etc.
  7. Conference registration fees for all of the publications that emanate from the fellowship.
  8. Travel expenses for one flagship top-tier conference in case fellow’s work gets accepted as a publication.
  9. A citation and certificate commemorating your achievement.
  10. Exclusive UnifyID Fellow swag.
  11. A chance to present at the UnifyID Tech-expo Day in May 2017.

 

DELIVERABLES

  1. A short paper describing the project.
  2. A detailed, well-commented code submission on either ai-on.org or http://www.gitxiv.com (in case you have an arxiv worthy submission).
  3. A one-page blog post providing a less technical version of the project details. ($ ipython nbconvert–to markdown notebook.ipynb–stdout will do!)
  4. A final presentation in .ppt or .pdf format during the UnifyID Tech-expo Day.

We also expect that with regard to some of the projects, we may be able to munge certain openly available datasets and upload with associated open problems on ai-on.org if the fellow is limited by the timeline of the fellowship.

 

REQUIREMENTS

We welcome applications from practitioners, tech-enthusiasts as well as students spanning both the undergraduate and graduate levels, preferably from the SF bay area. 

 

Tracks Languages Libraries/Platforms/Frameworks
Machine Learning Python, Lua, Julia, R, Scala, Java Scikit-learn, Torch/Autograd, Caffe, Keras with Theano/TensorFlow, Chainer
Mobile Development Swift, Objective C, Java Core Location, Core Motion, Core Bluetooth, DeepLearningKit,Accelerate: BNNS, CoreAudio/AudioKit
Security C, C++, JavaScript AES, RSA, ECDSA, PKI, Functional Encryption, Enclaves (SGX)
UX/UI Development (Portfolio Review)
Tech Journalism (Portfolio Review)

Please apply here with the following:

  1. Resume
  2. A personal statement (no longer than 250 words) explaining what you expect to achieve with this fellowship.
  3. A 5-slide presentation (ppt or pdf) detailing your most cherished accomplishment in the area you are applying to (with links to publication(s), GitHub code-base, live-project link, etc.).

 

UnifyID AI Fellowship

San Francisco, CA

Program Weekend Dates: February 23 – May 18, 2016

Application due date: January 31, 2017, 11:59 PM (PDT)

Announcing the UnifyID AI Fellowship

Today, we would like to announce the UnifyID AI Fellowship program for Fall 2016. The fellowship runs for six weeks, beginning October 28, 2016 through to December 4, 2016. This selective, cross-disciplinary program covers the following areas:

  • Deep Learning
  • Signal Processing
  • Optimization Theory
  • Sensor Technology
  • Mobile Development
  • Statistical Machine Learning
  • Security and Identity
  • Human Behavior

Our UnifyID AI Fellows will get to choose from one of 16 well-defined projects in the broad area of applied artificial intelligence in the context of solving the problem of seamless personal authentication.

All selected fellows will be awarded:

  1. A fellowship stipend.
  2. Access to state-of-the-art GPU hardware and $360,000 in Microsoft Azure cloud service credits.
  3. Weekend access to our office space in SoMa, as well as as-needed access on weekdays.
  4. Prepaid Clipper card to help with commuting to/from the office.
  5. Chance to collaborate and publish with top-tier security experts from MIT, Stanford, CMU, Berkeley, Dartmouth, etc.
  6. A citation, certificate, and plaque commemorating your achievement.
  7. Exclusive UnifyID Fellow signature bags and sweatshirts for the Fall 2016 inaugural class.
  8. A chance to present at the UnifyID Tech-expo Day in December 2016.

We expect the work from your Fellowship to result in either a publication (with fully open-sourced code and data repository on GitHub for reproducible research) or a patent filing.

 

REQUIREMENTS

We welcome applications from practitioners, hackers, tech-enthusiasts as well as students in full-time accredited academic programs both at the undergraduate and graduate levels, preferably from the SF bay area. An ideal candidate has both math and coding chops, but more importantly, this individual is an engineer, signal-processor, hacker, and self-proclaimed guru who is comfortable with crafting, hacking, implementing, re-implementing, and breaking Machine Learning algorithms deep, shallow or otherwise.

 Tracks Machine Learning Mobile Dev.
Languages Python, Lua, Julia, R, Scala, Java Swift, Objective C, Java
Libraries/Platforms/Frameworks Scikit-learn, Torch/Autograd, Caffe, Keras with Theano/TensorFlow, Chainer Core Location, Core Motion, Core Bluetooth, DeepLearningKit, Accelerate: BNNS, CoreAudio/AudioKit
OS Ubuntu, OS X, RHEL / CentOS / Fedora, iOS, Android

Please apply here and include in the open form field, a personal statement (no longer than 250 words) explaining what you expect to achieve with this fellowship along with your favorite moment in the sun (publication, GitHub code-base, live-project link).

 

UnifyID AI Fellowship

San Francisco, CA

Program Weekend Dates: October 28 – December 4, 2016

Application due date: October 17, 2016, 11:59 PM (PDT)