Seeing
Upstream
Every project I've worked on has taught me the same thing from a different angle: the hard part of perception is not the model. It's everything that happens before the model.
In the Bowden Lab at Vanderbilt, the problem was specular reflections. Bright spots on endoscopy video that washed out the tissue a surgeon needed to see. I built a pipeline called SpecReFlow that used optical flow to pull clean pixels from neighboring frames. The model worked. But the reason it needed to exist was a data problem: the camera was capturing the wrong thing.
At Modern Intelligence, the problem was modality laziness. We trained vehicle re-identification models across RGB, near-infrared, and thermal cameras. When you train modalities together with a shared loss, individual sensors stop trying. They coast on the gradient from the dominant modality. We showed that training each modality alone and concatenating at inference produced better representations than any fusion method. The model wasn't the bottleneck. The training setup was.
At Datology, the problem is curation. Which images and texts belong in a VLM training run, at what ratio, at what quality threshold. I build the infrastructure for those decisions: quality scoring, deduplication, diversity measurement, mixture optimization. The model is downstream of all of it.
The pattern keeps repeating. The leverage is upstream.
Perception
I think a lot about why biological perception works differently.
A human eye doesn't fuse modalities the way a late-fusion model does. It doesn't even try. Different sensory streams stay partially independent, get integrated at multiple levels, and the system tolerates ambiguity instead of resolving it into a single embedding. That's closer to what UniCat stumbled into than what most fusion architectures try to do.
I don't have a grand theory here. I just notice that the hardest perception problems I've worked on were hard because we imposed the wrong structure on the input, not because we lacked capacity in the model. Biological systems don't seem to make that mistake as often. I want to understand why.
Robotics and embodied AI pull on this thread. Perception with physical consequences is different from perception for retrieval or classification. When your next action depends on what you see, the cost of misperception is immediate. I want to understand where that changes the design.
Aliveness
I picked up viola again after years away. I'm working on Romanze, Op.85 (Bruch, Max). It uses a part of my brain that doesn't get exercised writing training configs. Listening to intonation requires a kind of attention that's closer to perception research than I expected.
I cook. Not in a content-creator way. In a "this is how I decompress and think about something other than loss curves" way.
I believe the purpose of life might just be to be alive. I don't say that as a brand statement. I say it because I spent my early twenties optimizing for achievement metrics and I'm trying to hold more than that now. This site is part of the effort. It's not a portfolio. It's where I try to keep the whole picture in one place.