Discussion about this post

User's avatar
Michael Geer's avatar

Agreed, domain specific seems the likely progression that we are seeing the first steps towards.

What are your thoughts on moving away from training on language?

I'm a big fan of us humans and the language we produce, but if it is say medical/health foundational knowledge we are seeking to build, I would posit that we should be training on biological measurements and data on interventions/actions that were taken on bodies (and the measurements on those bodies before and after). Possibly mixing in graphs of strongly established biological pathways, but leaving language from textbooks and scientific papers out of the training set as they are abstractions from data we have more directly.

Adding a link to the post I made elaborating on another probable upside of using this type of time and location stamped data as opposed to language for the training set: https://michaelgeer.substack.com/p/large-biological-models-inherently

Expand full comment

No posts