I build environments at Mechanize, Inc that frontier AI labs use to train and evaluate coding agents on long-horizon software engineering tasks.
I led the development of GBA Eval, an OSS long-horizon coding benchmark. If you're interested in the work Mechanize does, you should check it out.
I'm also on leave from Stanford, where I studied mathematics. Previously, I've worked on empirical economics research in innovation and patents, primarily using language models as a measurement instrument. I also competed nationally in mathematics and policy debate competitions.

Mechanize· Senior Software Engineer
RL environments and evaluation infrastructure for coding agents.
Prior research at the intersection of machine learning and economics. Full abstracts on the papers page.
Yang, S. (2025). Understanding Innovation Quality and Success with Large-Language Models. Working paper. I use LLM embeddings of patent text to better understand innovative quality.
Cong, L. W., & Yang, S. (2025). Understanding Patenting Disparities via Causal Human+Machine Learning. Working paper. We develop a novel causal inference framework using ML and apply it to study USPTO decisions.
Yang, B., & Yang, S. (2025). Endogenous Fracturing under Partisan Voting. Working paper. A model of legislative coalitions; applied to the McCarthy speakership.
Yang, S. (2024). New Criteria for Triangle Similarity. Mathematics Magazine. Analytical proof of two new triangle similarity criteria involving angle ratios.
© 2026 Stephen Qingyuan Yang
