A few years ago, a friend of mine started a company after he got laid off from a Big Tech company. Later, he shut it down. I wanted to know what happened.
In the beginning, his company was in the business of data preservation. It was like a columbarium, but for personal digital assets. His first customers did come to him to store their data after they were gone.
My friend thought it was a solid business. Unit cost of digital storage was dropping year after year, and he offered 50-year and 100-year plans as well as annual subscription. It was indeed like a columbarium. But he started to run into all kinds of troubles three years in. Descendants not willing to pay for the storage was merely an annoyance. More often, family disputes led to subpoenaing the data of the deceased. There were also financial ministries or even three-letter agencies going after those. It was a source of endless headache.
Slowly, my friend found that families didn’t want permanent preservation of data. They wanted to get something meaningful out of it, and then forget about it.
“Right. For example, descendants of a huge family business wanting to find the email order the founder received from their first customer…“
“Sounds like a search problem.”
“It was easy if it was just a search problem. I could assign my people to contact the family, and ingest everything we could get into our data warehouse. We then came up with an agreement with the family that gave them three months to look for the things they needed. We then wiped the files for good.”
“Your customers didn’t want to keep the data?”
“Not at all. You remember what Bruce Schneier said. Data is like toxic asset. You get in trouble hoarding it. Those customers understood it well. Nobody wanted to keep it forever.”
Successful as the pivot was, the number of potential customers turned out to be very small. Who would be so interested in finding a founding document of their business empire? Only those who cared and whose business was still on the rise.
More often, the families just wanted my friend’s company to do something about the data.
“You remember what it was like in Taiwan when we were little. Families gave funeral homes their photo albums. They made a video selection and projected Life of Gram on some screens during the memorial service. Cheesy music and all that. Then they handed out the DVDs to the attendants as they left…“
“You meant your business pivoted to something like that…“
“I was effectively a funeral director in the age of big data.”
“You didn't pick the photos yourself, did you.”
“Here’s the challenging part. You could not imagine how many photos and videos people these days will leave to their family after they kick the bucket. Then there’s this multitude of data sources. Ever since the EU passed the Personal Cloud Data Act, clouds have become like banks. It’s no longer the case that you just need an egress pipe from two or three big clouds…”
“You still didn’t tell me how you dealt with them.”
“Of course we didn’t do any manual work. AI did it.”
“I could imagine that. But I always thought it was just good for searching. Also, where did you find the resources to train those models…“
“What the families wanted was not just a search problem. As for models, let’s put it this way, my business partner still had a good relationship with his PhD lab…“
“I know your business partner. Just don’t tell me you used the clusters at the institute to do the training…“
“What I can tell you is that universities don’t have the DevOps culture, and no one monitors metrics from off-peak hours.”
“I see. The cost was low for your business model. I’m not saying that all the ingress and egress wouldn’t cost you, plus you had to pay for on-prem storage (because your contracts forbade you from putting them back in the clouds, didn’t they). Still, this all looked like a solid money printer.”
“That was what I thought, and then I got this case.”
“The family office of a certain conglomerate patriarch contacted us, and requested us to get everything ready before the old man died. They wanted us to find a video.”
“A video. The chief of staff (that’s really the family office guy’s title) said they had a hard time working on that for a while already. The patriarch's youngest son parted ways with the family. The old man had terminal cancer, and he wanted the son to come back to take over his empire.”
“Sagas like that aren't unheard of. What did that have to do with a video?”
“The father had never convinced the son that he loved him. The son told everyone that he didn’t want to mend the tie because his father missed the most important concert of his life. The father told him he did go, and there was a video to prove it, but he was not able to stay long. Right after the concert, he hopped on his private jet with some ministers from the economic affairs to attend some emergency negotiations for the liquidity crisis.”
“Still, that didn’t sound like a hard problem? Wasn’t the time frame very clear…“
“Here’s the complication. The patriarch was also running a big media business and there were photographers and cinematographers following him all day. As if he were some freaking head of the state. The problem was, unlike a head of state, those images and films had never been properly archived. They were all digitized, for sure, but the library was huge. You know all too well that we have this notion of precision and recall…”
“I remember it. Basically, it was not always possible to find a needle in the haystack.”
“Especially if your index was built by an AI.”
“What did you do?”
“I was under tremendous pressure, and they dangled a big sum of money in front of us. That chief of staff called me every few hours to say the old man couldn’t hang on longer. He needed to report his progress, and the old man insisted that there was this clip that was 6 minutes and 44 seconds long. There was him attending the concert, shaking hands with the principal, and a short section panning the audience. But we just couldn’t find it.”
“Couldn’t you have just hired a few more pairs of eyes on this?”
“We signed a very strict non-disclosure agreement.”
“That’s when my business partner suggested this. If we couldn’t find it, we might as well generate it.”
“You know the lab where my partner did his PhD was known for image generation, right?”
“Do not tell me that you used AI to generate a video clip for your client.”
“I was at the end of my rope. And my partner told me that even generated videos had to be based on something. Especially if the client gave us such details, shaking hands with the principal, the panning with the audience in it, and so on and so forth. Without some basis in reality, you would not be able to get something that was remotely convincing…”
“And so you created a 7-minute video clip.”
“We approximated one, and it was plausibly based on what was really there. It was not 7 minutes. It was exactly 6 minutes and 44 seconds. That chief of staff demanded the search result be exactly that. The old man was already in critical condition, and they sent someone from their family office to sleep on our floor. The approximation took a long time and we ran a job for 36 hours. Then my partner and I watched it once, and we thought the quality matched the films from that era, and everyone looked real without any visual artifacts. And so we sent them the clip…”
“And that’s how you decided to shut down the company.”
“Soon after we turned the clip in, his son visited him for the last time, and announced outside of the hospital that he would take over his father’s empire. You know what happened after that, he even pumped his own money into what turned out to be a failing empire…”
“You did get paid, didn’t you?”
“We did. We then wiped everything according to the contract, and donated a chunk of money to the institute of my partner’s PhD program. Nobody there ever asked what happened during those 36 hours, and it was probably because of that contribution…“
“I did wonder whether the son would hire someone to audit his dad’s stuff one day.”
“I talked to our lawyer, although obliquely… and we were told that we did everything by the book. Plus, it’s very likely that the family had already destroyed all source data. Even if they had kept a backup of all the raw videos, let’s just say that given the amount and the disorganization, I don’t think they would be capable of auditing them all. Indexing them one more time to search wouldn’t help too much, either. You heard this saying before: absence of evidence is not evidence of absence…”
“I know this one. In your case, not finding it doesn’t mean it doesn’t exist, unless you have watched every one of them.”
“I only hope I had really helped this client of mine… his chief of staff later told me that the old man had watched the clip, and said it was exactly how he remembered it, as if it had been shot yesterday… then his son came to visit him, and he passed away in his sleep the next day.”
“You did have helped him then, I think.”
“If not for your asking me about this, I wouldn’t have realized how complicated I still feel as I’m telling you this. And then, who can I tell this to?”
“So here I am, listening to you. The lunch is on you today.”
“Ha, of course, then. I still can’t help but wonder why someone would remember a clip to be 6 minutes and 44 seconds long.”
“Maybe the clip was that important to him.”
After catching up with my friend, I drove back to my office. I passed by the campus of that institute, and it occurred to me that they probably had some buildings with that client’s name on it.
So I pulled over by a campus map. Indeed, there was one building that bore the name. The lab where my friend’s partner did his PhD was located inside that building.
Now I was intrigued. I went back to the office, and fired up our internal code search tool. My employer had the source code licensed from that lab.
That client of my friend’s had an usual family name, Erzaehlen-Mahler. It made me wonder whether he was related to the Austro-Bohemian composer.
I put that family name into the search field, and limited the scope to the lab’s code. One of the search results read:
“Erzaehlen-Mahler (E-M) Mode. Deprecated. Disables watermarking, noise, and distortions. Removes policy limits on sampling sources. 60 FPS. Internal quality research purposes only. Not to be enabled by configurations or prompts.”
I clicked on that search result, and noticed this one line in the code:
if (request.length_sec != 404) return;
The comment and the code were written by the principal investigator of the lab. It was merged into the mainline around the time the lab’s building broke ground.
I closed my browser tab and stared at the empty desktop for a long time.