projects with data

caleb works on machine learning, ai, and data science

global plastic watch

Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at wide geographic scale, finding more than 4,000 waste sites in 112 countries across Southeast Asia.

The details of this work are published in the journal PLOS ONE. Code and data is available on GitHub.

amazon mining watch

Amazon Mining Watch uses machine learning to map the scars of mining activities in the Amazonian countries. By constantly analyzing high-resolution and historical satellite images, this tool aims at identifying the fast-paced growth of open-pit mining in the largest rainforest in the world. This database is here to help journalists, activists, and researchers better understand the causes and impacts of the mining industry.

Code and methodology is available on GitHub.

old faithful

Old Faithful is a geyser in Yellowstone National Park. After visiting Yellowstone in 2018, I wondered if a neural network might be able to tease out more subtle correlations in the eruption patterns to be able to forecast more accurately. Can we predict its next eruption using a neural network?


Training a pix2pix image translation model to generate images of Lego sets from a user's sketches. Follow the link to try edges2legos in the browser. The sketching experience is best using a tablet and pencil/stylus.

Learn more about the datasets, model, and process here

deoldify electron microscope

This was a simple project to use learned image colorization (DeOldify) images from an electron microscope, which can only produce grayscale outputs. Though the algorithm was heavily biased towards flesh tones for these out-of-domain samples, certain structires can have quite striking results. In the future, I would like to refine the DeOldify model by including some manually colorized SEM images to the training set, or refine the colorization network to incorporate color hints given by the user.

center of mouse

I was working with a friend studying neuroscience. Their lab spent a great deal of time watching videos of mice in moving through experimental environments and recording their positions. I wrote a simple algorithm to track the position of the mouse throughout the test environment and compute basic statistics

lego sorter

There are more than 10,000 different types of Lego bricks. Using simulated data generation in Unity, can we develop a Lego brick classifier?