Benchmark and evaluation systems for VLM research
Built evaluation paths that keep model comparisons useful for curation decisions and model iteration.
I work across multimodal pretraining, VLM evaluation, distributed training, and research infrastructure. Recent projects span data curation systems, vLLM eval paths, multi-node training, and agentic tooling that shortens experimental loops.
The through-line is fast feedback from data decisions to trustworthy model comparisons.
Three systems lines currently carry the research loop from benchmark design through training and deployment.
Built evaluation paths that keep model comparisons useful for curation decisions and model iteration.
Built ingestion and export paths that make large multimodal corpora easier to train on and inspect.
Added vLLM eval support and hardened multi-node launch plus checkpoint behavior for faster experimental turnover.
Two public artifacts that still reflect the same research-engineering direction as the current work.