Stephen Yang — Mechanize

About

I build environments at Mechanize, Inc that frontier AI labs use to train and evaluate coding agents on long-horizon software engineering tasks.

I led the development of GBA Eval, an OSS long-horizon coding benchmark. If you're interested in the work Mechanize does, you should check it out.

I'm also on leave from Stanford, where I studied mathematics. Previously, I've worked on empirical economics research in innovation and patents, primarily using language models as a measurement instrument. I also competed nationally in mathematics and policy debate competitions.

Work

2026–

Mechanize· Senior Software Engineer

RL environments and evaluation infrastructure for coding agents.

2025

Superscored· Co-founder

LLM-native adaptive tutoring.

Correspondence

yang29@stanford.edu

Selected Bibliography

Prior research at the intersection of machine learning and economics. Full abstracts on the papers page.

Yang, S. (2025). Understanding Innovation Quality and Success with Large-Language Models. Working paper. I use LLM embeddings of patent text to better understand innovative quality.

Cong, L. W., & Yang, S. (2025). Understanding Patenting Disparities via Causal Human+Machine Learning. Working paper. We develop a novel causal inference framework using ML and apply it to study USPTO decisions.

Yang, B., & Yang, S. (2025). Endogenous Fracturing under Partisan Voting. Working paper. A model of legislative coalitions; applied to the McCarthy speakership.

Yang, S. (2024). New Criteria for Triangle Similarity. Mathematics Magazine. Analytical proof of two new triangle similarity criteria involving angle ratios.

yang29@stanford.edu

GitHub Scholar SSRN LinkedIn