Doing machine learning consulting projects is dangerous

When your client asks you to “Use Machine Learning to do X”, be very very careful with your project.

This is not like other programming consulting jobs. Building production machine-learning systems is not like standing up a website or building a GUI. You don’t just import sklearn and model.predict() your way to 💵🤑.

You are almost guaranteed to be over-budget, out of scope, and unhappy.

Four warning signs your ML gig is going to be a nightmare

They want you to use “AI” but really some simple feature engineering and filtering is sufficient

They say they want to use ML, but there is no feedback loops from user behavior into improving the models.

They want you to use a new fancy algorithm, without first trying simple ones

They put all the burden on you: the “you’re an expert, make it work”

The missing parts in your Statement of Work

A typical programming consulting gig for a freelancer typically starts out like this:

Description: A high-level description of the problem from the client

Scope: You discuss business needs and technical requirements, outline risks and known unknowns.

Review existing code and related systems: Review existing codebase (if exists) and any new APIs or services you will need to interface with.

Proposal: Decompose project into Milestones, forecast development time and enumerate costs.

Negotiate pay schedules, timelines, acceptance criterion

Code!

You are missing Feedback cycles!

Add in time and expectations for out-of-sample model evaluation

Add additional time in the schedule for at least 3 occasions to improve your model using feedback from real users and from the client

With ML-based projects, there is almost always more back-forth with results and iteration and more time on the client’s side. Also the quality of the result is probabilistic and many people are not comfortable with False Positives and False Negatives and Stochastic outputs.

Components of a production ML Project

Here is an semi-organized set of questions you should think about when making a product proposal. If this is valuable to you, let me know on Twitter and I’ll make it better

Sean Kruzel (@seankruzel) on X

Father of 2, Builder of alternative data investment strategies, MIT Alum Trying to give back as much as I get from Twitter

https://twitter.com/seankruzel

Data Sources

Who is managing the data ETL?
How is raw data updated?
How much history is available? Is there any survivor-ship bias in the data?
Do we have timestamps as to when the data was available (to prevent look-forward bias)?
How is the data cleaned? Does that cleaning introduce any look-forward bias)?

Feature Development

How are key features developed?
Where are features stored?
How is feature drift managed between in-sample and out of sample data?
Are the same features available in training as in deployment?

Model Development

Target Variables:

What are we predicting? It it every directly observed (like a future stock price or election result) or is it subjective (like the sentiment of a movie review)?
Do we require hand-labeled datasets?

At what accuracy?
How many?

Loss Function:

What type of error are we trying to minimize?
Do we care more about type I or type II errors?

Model Specification

What types of predictive algorithms have been tried?
How do we reason about the trade offs between model complexity vs model overfitting-risk

Model Deployment

How are you tracking model drift, data drift
Version control and auditing
Are costs well understood (costs = money + compute + latency)

What kind of ML project is this:

There’s existing ML production in place and you are adding a new model into production

A new project

is there labeled data?

Yes:

Is it the right kind of label for the task at hand?
Is the dataset balanced / have enough labels of each kind?
What’s the accuracy / data quality of the labels?

Does the team have experience labeling training data?

If yes: do they have people trained for this dataset and have time available to train more data
If no: who is going to FORCE them to be trained and FORCE them to label the data well. Honestly people really hate labeling data, especially the important and cognitively difficult tasks.

Loss Function?

Explicit or Implicit - Do we need to learn the loss function?
What are some positive and negative examples

How is dataset updated?
How is deployment managed?
What are the budgets for training / compute