33 Oxford Street, G-107
Cambridge, MA 02138

Capstone Research Project course

2019 Projects

Text-to-Image Recommendation Algorithm

Many publishers subscribe to AP so they can buy images to illustrate their newspaper or magazine stories. The main way to do this is by searching the text associated with an image, for example, through http://www.apimages.com/. A Text-to-Image Recommendation algorithm would enable AP to build a service which would take the text of a story and recommend photos from AP’s archive.

Optimal Real-time Scheduling for Black Hole Imaging


In April 2019, the Event Horizon Telescope (EHT) Collaboration released the first image of a black hole.  To accomplish this, the EHT used radio dishes across the globe simultaneously recording radio waves from near the black hole, synchronized by Global Positioning System (GPS) timing and referenced to atomic clocks for stability.EHT observations typically take place during a 10-12 day window with 5-6 days to be triggered when conditions are optimal.  This project's goal is to use machine learning and/or prediction methods to help the EHT determine which nights should be triggered for global observations. This is an opportunity for students to work with EHT scientists and engineers on various aspects of black hole science in order to assess the probability that observations will lead to breakthrough results.

Neural Architecture Search (NAS)

Neural architecture search (NAS) has gained popularity in recent years, where best performing vision models often use architectures designed from data using machine learning (often called meta-learning). It is currently unclear whether best performing CNN architectures on some dataset of natural images are also best performing on other natural image datasets, but empirical results point to this possibility. An immediate question is whether these top-performing CNN architectures are similarly performant for scientific datasets. This research project would use random search as well as DARTS type approach to evaluate CNN architectures on natural image datasets and scientific datasets and look at the performance of architectures on these different domains. If it turns out architectures that do well on natural images also do well on scientific datasets, then the field should employ most recent CNN architectures on their scientific problems. If it turns out differently, then architecture search should be employed on scientific datasets directly which would have been started by this project. 

Creating and Leveraging Structured Text from the Vast Internet

Kensho, a private finance technology firm, aims to understand and predict financial markets, ideally by using the wealth of text data on the Internet. Most text on the web is unstructured and free-formed, and thus challenging for algorithms to automatically reason about and leverage. Toward this goal, students will use public wikimedia/wikipedia data to create graphs of knowledge. Additionally, students will investigate how to use this abundance of text to improve the performance of natural language processing tasks (e.g., named-entity recognition) that have historically relied on having structured, hand-annotated data.

Daimler / Lab 1886: Helping Smart Cars Avoid Potholes and Other Dangers

Daimler, a German car manufacturer and parent company of Mercedes Benz, is interested in creating a smart vehicle that can detect road hazards such as potholes, dangerous cracks, and unmarked construction zones. Currently, data has been captured with smartphones (equipped to vehicles), and the goal is to develop algorithms to detect challenging road conditions and to construct reasonable metrics that quantify these road conditions -- which will allow Daimler to provide road summaries to appropriate cities/municipalities.

Does Somerville really have a parking problem?

 In this project, we will explore parking capacity in Somerville public parking and in private driveways by using satellite images from 2017. This analysis of parcel and parking authority data will help determine current capacity and demand. 

Spotify Challenge: Offline Recommender System

One of the main challenges for Spotify is to recommend the right music to each user. Users' satisfaction can be monitored based on whether they skip the recommendation.  Therefore the goal of a good recommender system is to show users content they like, and to minimize the probability that they will skip a song. In this project, we present the problem of sequential music recommendations.

Future Projects

Automated Manufacturer Error Detection via Computer Vision

BASF, one of the largest chemical engineering companies in the world, plays a role in many of the products we buy. With such large-scale manufacturing, it would be immensely helpful to use AI to help automate some of the quality control and error detection in the manufacturing process.


Polymer foams are used in many applications. They are produced in a continuous or semi-continuous production plan. Sometimes the final foam material shows flaws due to small erratic variations in the process. Today visual inspection is routinely done by people in the production plant. Unfortunately, this surveillance cannot be done on every sample (lack or personnel). Therefore, this might be a very promising use-case to employ automatic video recognition in order to do a digital error recognition. Typical challenges are changing light conditions, changing white balance, changing contrast, vibrations in the video, and position changes of foam blocks.

For this project, BASF will provide videos of a particular production/assembly line, and the goal is to develop a robust computer vision algorithm(s) to automatically detect in real time the items that are faulty with cracked surfaces.




As the largest retail company in South America, the S.A.C.I. Falabella group is faced with diverse challenges that demand a thorough understanding of how customers interact with the many stores in the group (shopping centers, online stores, supermarkets). Falabella is implementing the click-and-collect (buy online, pick up in store) modality on a large-scale basis. The goal is to develop prediction systems that address demand prediction, optimize the placement of new stores and Identify problems with click and collect (e.g. wrong size) so particular outcomes can be predicted (to which store will the product be returned?) ​

Data-driven Lead Optimization via Artificial Heart Model

One of the main challenges in drug development is identifying candidates that will result in a safe and efficacious compound.  In this project, we propose to build a model to optimize the risk assessment of compounds early in the drug discovery process. The goal is to use this model to mitigate high-risk compounds and make them low-risk via chemical modification.

Intermittent Demand Forecasting


Wayfair’s catalog of over 14 million products includes a large quantity of “long-tail” products that sell in single-digit quantities per month or may experience long periods of zero sales followed by short bursts of double-digit demand. In these cases, where realized demand is both sparse and intermittent, traditional forecasting methods such as exponential smoothing and ARIMA produce inaccurate estimates.