Good Morning. Hello. How are you? #1459
What I want from a personal LLM does this exist?

Good morning. Hello. How are you? I am good. If all is well, and who can predict the future, I am driving on I-95 to Boston right now. By the time you get this we should be around DC, trying to get through DC while the express lanes are still running north. Wish us luck.
In actuality, it is about 30 minutes after I just wrote to you. In the interim, I have caught up on the day job and the pool job, checked the finances of both, checked the banks of both, checked who is out at work, talked to the CRO about a problem client, checked the status of both our solar systems, checked my retirement funds (I do it every day in contravention of all practical advice), checked my Youtube queue, a work Monday board that grants permissions to clients that I gotta check every monrning, tagged my Quicken expenses, wrote a few emails, wrote my 750 private words for the day, though today is the 27th so those were to my daughter for her future book.
Now I am here with you again.
And housekeeping item: I mis-labled yesterday’s issue 1448, not 1458. I have fixed it in the archives but I am afraid your email archives will be permanently screwed. I am sorry.
Housekeeping item two: Two days ago I mentioned the new shitty NC Bathroom bill. And I also mentioned Derek Thompson living nearby in Chapel Hill. Both came from the Axios Raleigh local edition. I should have linked to it. Not trying to be the Times here with their BS URL policies. Here is the link. Apologies.

Today I am going to talk about — I am sorry — LLMs. And, specifically, what I would want from an LLM.
I do not know if this is possible. I suspect the answer is no. What I want does not seem that much different than the service OpenAI et al offers to companies: use our LLM to feed your own documents into it so you can analyze your own documents.
But I suspect there are some key differences.

I should also say I think I am unique amongst humans, or at least incredibly rare, that I have an absolutely enormous, just flat-out gargantuan corpus of words and documents to feed into this thing. To whit:
This daily email now numbers about 2 million words
I have a personal journal I have been writing since 1987 that numbers over 4 million words.
I have about 2-3 million words of letters
I have about 5 million words (rough estimate) of work product from a dozen or so jobs spanning 20+ years
I probably have 3-5 million more words (rough estimate) of emails and AIMS
I have my books, four published, maybe five more unfinished, another million words.
I don’t think it’s an unreasonable estimate to say that I could have 25 million+ words, perhaps a lot more. This is not an internet-scale, but it is a lot for a single human.
It is a lot. It is also varied, and it comes very close to capturing all facets of an American adult human life: the professional, the personal, the public, the private, the artistic, the casual.
This leads to my first requirement: I do not want this shit ingested into an LLM for anyone else. I do not want it to become part of a corpus. I do not even want it to leave my servers. I can provide a robust, high-availability IT infrastructure for accesss to it, but I will do this at home, within my own walls, where it does not leave, or I will not do it at all. This is non-negotiable.
Other requirements:
I would require this to run on my own hardware, aside from a base, non-personal assistant LLM that helps me build the library and interpret it. The personal stuff and all processing of that data stays on my hardware, even if that is an absurdly expensive proposition.
In addition to ingesting my own work, I would like it to be able to ingest third-party works that are meaningful to me.
This would be for my private use. Things I have purchased. I will not use this commercially, I do not intend to take other people’s works and ‘steal’ it. I just want to be able to personally scan things I already own. Including:
My 100-200 most important books.
My purchased music collection
My purchased film collection.
I suspect this is basically impossible at the moment.
I do not want the LLM to imitate me or pretend its me or any of that shit, in fact that is a deal-breaker.
It is a research assistant. It may draw inferences and interpretations of things, but it would put forth these interpretations as it’s insights, not mine.
It should be context aware: it should know which of the corpus is personal, professional, private, public, artistic, casual, etc. and be able to factor that context into its inferences.
It should be able to cite its work. If I query it about what I was doing on X day, it should be able to point me to specific emails, journal entries, rough drafts, work product, IM chats, etc.
It would let me perform natural language actions, from loading the corpus to inquiring against it. And it would answer in natural language.
It can reason and offer opinions, when asked, about the corpus. It can also offer different interpretations of these opinions based on segments (personal v professional) of the corpus.
Does such a thing exist? I feel like the answer is no, right? I also feel like the answer is maybe “soon?”
I also feel kinda like.. I dunno. Like I have one shot. That these companies would steal my corpus. That I need to somehow be sure that my shit doesn’t leave my servers. Just T&Cs isn’t enough. Cuz that’s the thing about these LLMs: There’s no undoing them. They take your shit, they can never give it back. Not really.

I would add my daily Jane anecdote here, but who knows what has happened. Hopefully she is sitting contentedly in the back of the car watching Bluey. Hopefully she is wearing headphones and I can listen to music. Or maybe we are listening to the Jane road trip Spotify playlist. Oo I guess I can share that with you today for end-of-GMHHAY playlist, huh. Yes, okay, let’s do that.

Normally, obviously, I confine the daily playlists to one hour of runtime, so you, the reader, can listen to it that day. But this one is different, it is 20 hours long, ehough to get us to Boston playing it the whole time if we have to, so Jane can like songs and daddy can like songs all at the same time.
Mix it up have fun.
Oh man it is 9:38 AM on Thursday and I have probably already written 4,000 words today I am a bit done with writing for the day.
Have a lovely weekend.
—
Thanks for reading.
And hey! Maybe buy one of my books!
Good Morning, Hello, How Are You vol 1.