AI transforming work and work-life (actual examples?)

To build on @Chronos’ comments: LLM don’t have any direct knowledge.

The way your use-case gets solved is:

  1. There would be an agent that intercepts the prompt.
  2. The agent says “Hey, I can help with this!” and looks up a database of words.
  3. The agent adds this information to your prompt so that the LLM can format it into a proper sounding answer.

You can have a bunch of agents with a bunch of specialized knowledge and coincidentally use an LLM itself to pick the agent to use.

This is the same process that people are doing manually when they are ‘prompt engineering’. They are giving the LLM more and more data in their chat history so the LLM can appear to know something. It is analogous to a well-spoken friend who doesn’t know anything, but is really good at searching the internet.


LLMs are just translators. They can take one set of text and turn it into another set of text. They can translate English to French, or translate a long document into a summary, or a description to programming code or database statements. They can sometimes translate a question to an answer provided it is translatable, but it breaks down for tasks that require analysis – like your Wordle example.

What is amazing is just how many questions are translation problems.

LLMs have some knowledge; they know what words mean. Better yet they know what they mean in context – so they are able to distinguish the 430 different senses of the word ‘set’.

More accurately: They don’t exactly know what words mean. They know what words mean in relation to all of the other words they know. They know how the words Chicago and NYC are similar and different compared to other words like Illinois, New York, Springfield, and Albany. They know how the meanings of these words might change if the context includes musical or The Simpsons.

* Technically they know tokens or word-parts, not words.

You are describing the ‘mixture of experts’ approach to LLM design:

LLMs are not just translators, and they do know what words mean. If an LLM sees ‘New York’, Associative parameters will light up to ‘remind’ the LLM of all the things it knows about New York. The same process that goes on in human brains.

This is almost certainly due to the way LLMs tokenize data. A token is the fundamental unit of analysis, and consists of a word or a portion of a word, but not normally individual letters. It’s also one of the reasons they still suck at Math.

However the ability to deal with individual letters and numbers is emergent, so some capability may exist in some LLMs.

At my company we’re using AI to replace tech support. I’ve seen the work in progress, and there is a credible prospect of absolute annihilation of human tech support. It will come in 2 waves

  • Next 2-3 years: broad elimination of tech support humans who basically look things up, synthesize a response, and advise the customer
  • After: elimination of the ones who use expertise to analyze and make inferences on human situations
  • Ongoing: elimination of any position where the quality impact of eliminating humans is less costly to revenue than the cost of having humans.

To expand on that last part - if AI can only match 20% of human quality, but the revenue impact of doing that is less than the cost of headcount at current quality, then garbage becomes the new quality standard.

Remember 20 years ago when LED technology could’ve/should’ve/was going to give us light bulbs that last forever? Instead we got these weird little pieces of shit that are fickle about power quality, and they start to strobe, dim, buzz, and crash after a year or two of use. The consumer experience didn’t change much, the major innovation was profitability.

LLMs aren’t translators, and they don’t know what words mean. They don’t know anything at all. They draw on a large corpus of statistical information about language and run it against a neural net to do what, in effect is, answering: if I asked the entire internet this question, what would the aggregate response be?

Once you’ve used Copilot or ChatGPT enough for programming tasks, or to gain insight into the behavior of tooling, you will quickly see that it behaves a lot like what I call the “office bluffer”. Claims to be an expert, answers with complete authority and confidence. In the background he’s just Googling things and formulating a plausible-sounding response. But the difference is that the LLM doesn’t Google-search, it doesn’t do even rudimentary fact-checking, it doesn’t ask itself “does this make sense based on what I know?” It’s just “here’s what I think most people would say about that”.

I can’t tell you how many sessions I’ve been in ChatGPT asking how to use a software feature that it knows quite a bit about (because public documentation was part of its training set). Session goes like:

By the time the session has ended, it’s abundantly obvious the LLM is not aware of its own statements, it’s not cross-checking information against its corpus, it’s not using external sources, and it’s not applying basic logic to anything. I can return tomorrow and it will produce the exact same conversation. It knows nothing, it’s just a highly advanced tool for bullshitting.

I would further add - the information I was summarizing is about a tool that follows rigorous logic on very predictable data structures. It would be trivial to have it ingest the user’s manual and come up with more use cases than I could ever imagine in a lifetime. LLMs don’t attempt to do that, it’s just “what words would follow the words I just saw.”

To be clear, I’m not saying it’s useless. Even well-trained bullshit can point you in a productive direction if you understand that’s what it is. Even its errors can hint at pitfalls in the ways humans understand that information, because it came from humans after all. But nobody shoud labor under the illusion that LLMs have anything that resembles thought, knowledge, or a mind. They are intensely-trained bullshit engines and nothing more.

No, MoE is a different thing – it is model architecture and training technique. What I am describing is the application built around an LLM. It’s not sufficient to use an LLM alone for many applications. To build an assistant that can answer questions about the weather, you need to attach an agent that can look up the current weather for a given location. You use the LLM to parse the user’s prompt and determine the intent is ‘give me the weather for my current location’, you use an agent to query the weather, then you use the LLM again to format the response in a pleasing way.

One common agent is a vector database that you can fill with text (e.g. prompts and responses) to maintain a longer history, or context, from the user. The app takes the prompt, calculates its embedding, finds the N-closest vectors in the database, takes their corresponding text, and includes it in the prompt before sending to the LLM. It works fairly well, but is sensitive to how you split the text you feed the database.

This is imprecise. They know two things:

  1. The N-dimensional embedding for each token. This embedding captures all of the features for that usage of the token. These are context-aware embeddings so the tokens making up the word ‘Chicago’ will likely have different embeddings if the context is city, musical, movie, band, song, or operating system codename. Theoretically there could be a dimension in there that tells us that the token refers to a state’s largest city, but it is unlikely.
  2. The model weights that allow it to transform (aka translate) the set of input embeddings to the set of output embeddings and then tokens.

There are emergent properties such that if you ask it ‘What is the largest city in Illinois?’ it knows that the statistically most likely response is ‘Chicago’, but it doesn’t really know why. If Chicago dissolved the town charter 5 minutes after the model was trained, then it wouldn’t even be correct.

I think this is an important distinction and why I called it out in my post. I think it is something we, as users of LLMs, should be aware of.

If you give the LLM more data in your prompt or the current chat history the LLM appears to know more, but there are caveats. The models have a limited context window and they use the context window inconsistently. This study analyzes how models pay more attention to the start and end text more than the middle.

They are though, but not just language translation – any type of text translation. The model is tokens in → tokens out, the architecture is called transformer, and the first tasks were language translations.

I’m being a little pedantic about this because I think it gives good insight into what tasks an LLM can and cannot do.

Yeah that’s fair, I was speaking loosely. I don’t think of general machine text translation in the same task category as translation of human languages. The machine does very well when intent and subtext aren’t important, so it’s a good fit for straightforward texts like user manuals. Wouldn’t rely on it to translate rhetorical or literary text into another language.

This is getting into the realm of the old “Chinese Box” thought experiment. Based on the results, LLMs very clearly do know quite a few things. It’s just that some of the things they know are false, and some of their knowledge, true or false, is prone to error. And of course their knowledge is encoded in a very complicated and difficult-to-understand way internally. Just like humans.

So you’ve solved the “Chinese Box” experiment and proved that it demonstrates knowledge, is that what you’re saying here?

This is not a meaningful definition of “knowledge.” And in the example I showed above, this wasn’t even an example of false knowledge. The machine contained the right answer. It corrected itself when I prompted it. It was simply failing to reason, because it lacks the capacity. Knowledge absent reason is just data.

Even if that were true, (which it isn’t), the encoding of information doesn’t tell you anything the absence or presence of knowledge or cognition.

Sure. LLMs have no ability to think continuously. They aren’t recurrent. So if you want them to ponder their own output, or take on a series of tasks, or do any numbner of things that aren’t a straight run-through a fully connected network, you need artifacts around them like agents. The way we are going, the agents themselves may have LLMs behind them to make them smarter.

You are making categorical statements about things that are currently the subject of research. I think you are saying that LLMs don’t have mental models of the world to make sense of things, don’t understand the relationships between things, etc.

I suspect the opposite. I think if you ask an LLM the population of Chicago, it would likely build a model that contains Chicago’s population, the various statistical products that describe it, commentary about Chicago losing population, etc.

Ilya Sutskever, chief scientist at OpenAI, suspects that LLMs DO build extensive models of the world. But he’s not certain, because we are barely scratching the surface of understanding how LLMs actually do what they do. We understand the transformer architecture, but that’s actually a small amount of code. The vast majority of code that enables an LLM to do what it does is contained within billions of parameter weights and associations which are almost as impenetrable to us as human brains. The only thging we have going for understanding them is because they are digital and artificial we can instrument them, examine in detail any parameter, etc.

Mechanistic interpretation is the practice of attempting to understand LLMs by looking at the mechanics of what they do. It’s still in its infancy, but so far they are finding the same kind of structures and associative ‘neurons’ that we find in humans.

So at this point I’m not willing to concede anything about how LLMs do or don’t do what they are doing, other than it’s really hard to see how they can do the text output they do without having rich models of the world somewhere. And Large Diffusion Models seem to do the same thing, as evidenced by Sora getting the physics correct in so many generated videos.

There are different things going on here. You can use the context window to provide data for the LLM to analyze, but you can also use it to tell the LLM how to think about the problem. Giving it an example solution, for example, or telling it to think like a scientist.

One of the stranger recent discoveries with ChatGPT is that it seems to perform better if you promise to pay it for a correct answer. And apparently you get better answers to some questions if you tell it to think like a Starship Captain. Strangeness that highlights how little we understand about these things.

Context windows are also growing rapidly. Gemini Ultra has a 1 million token context window.

If you’re requiring that a system not count as “knowing anything”, just because some fraction of the things the system thinks it knows are incorrect, then you’re forced to conclude that humans don’t know anything, either.

This is not my position. My position is that the LLM knows how tokens are interrelated without knowing specifically what they mean. From earlier:

Your assertion is that an LLM knows what words mean. Mine is that this assertion is imprecise.

The LLM knows:

  • how tokens are related (similar and different)
  • limited to only as many relationships as can be encoded into N-dimensions
  • focused only on relationships necessary to convincingly transform a stream of input tokens to output tokens

I’m not convinced this adds up to knowing the meaning of words. As you said, this is the subject of research and remains to be shown.

in an effort to bring the thread back ON TOPIC:

before spending nearly 1bn on an expansion, I’d wait too, to see if the dust settles and/or this (text to video) might be the new normal or IOW “the future”.

Film-Video seems tremendously HR intense (core and fringe like lights/catering/costumes) and that segment seems also very well paid in California and Hollywood.

So there is a double incentive to push for a AI solution in this segment of the economy/culture.

No surprise that investments will be postponed (indefinitely?) in new hardware/meatware.

A few thoughts.

I have a feeling that it will become apparent that the LLM AIs are a version of monkey see, monkey do. Just a very big monkey that has seen a lot. What matters in terms of jobs is that, well, there are a very large number of people currently employed in monkey do jobs. There is a reason we often talk disparagingly about code monkeys. But of course it goes way beyond that. Any job where there is a script of actions, or a basic set of patterns that are amenable to a monkey do computer API are in real danger.

People wanting help with trivial coding tasks are very unlikely to be asking about a problem that has not been done over many times before. Stack Exchange is a good example.

When it comes down to questions of how much better LLM AIs will become, I remain cynical. There is the fallacy that if a lot is good, even more must be better. We saw a step change in utility when the systems reached a critical size. Whether there is another step change further out is unknown. Nor is it known what might be needed to make it work.

Other AI driven marvels are, I think, going to be a mixed bag. Generative AIs building fun images are going to again, drive out employment for creatives that are employed as monkey do hacks. Albeit hacks with skills. Getting on top of them as tools of the trade will aid some. But not everyone will survive.

When it talking about the limitations of LLMs, we are making a huge number of assumptions. We really don’t know. Comparisons with human brains are very dubious. So far as we understand either is that they are a huge mesh of connections that have a passing similarity in operation. We understand neither. Understanding neither does not make them the same thing.

Just questions of knowledge are fraught. I would not be surprised that in the future it becomes apparent that LLMs are just a highly efficient lossy compression mechanism for the huge tensor space they started with.

You might ask, does a LLM know that it knows something? Does it know what it does not know?

The ability to conduct inference is going to be one question about how well they work. And this is hard. How do you avoid it just regurgitating a pattern of words that is the subsequent of a predicate, simply because, in the trillions of words it has digested there is the same pattern? Monkey do again. Imagining that the training system somehow magics into existence an inference engine simply because it includes a large number of logical expressions in textual form is more than a stretch.

My apologies for the hijack.

As to the OP:

Microsoft CoPilot does a very good job transcribing and summarizing Teams meetings. I wouldn’t say this feature will eliminate jobs – just enhance performance or possibly boost efficiency.

I haven’t used ChatGPT to write any code, but I have experimented with it on tricky problems from the recent past. It does a pretty good job, 4.0 more so than 3.5. It seems like a more efficient way of cutting-and-pasting from a site like StackOverflow since the AI responses are more on topic. One of the inefficiencies of searching StackOverflow is finding a page with the right question and some answer.

As others have said: AI can do a very good job of customer support. I would expect the first few levels of support will be all AI in this decade.

For coding, I find it a real timesaver when trying to find out available APIs and their parameters and stuff. “I want to connect my serial Mitsumi RS-422 Calipers to an Arduino. What libraries are available, and could I see some sample code?”

It’s nothing I couldn’t do myself with some searching and reading, but the AI gets you started and gives you a sample which makes picking up and using all of it much faster for me.

I’m currently finding Grok very useful, because it is constantly training on the Twitter firehose of data. So you can ask it questions like, “Why did AMD stock take off recently?”

It then gives me links to the most recent tweets from trading houses about AMD.

If you want a fast catch-up on specific news topics, it’s pretty good. For example, I just asked it to summarize today’s events in the war in Ukraine:

This is really useful for less-covered stories where it’s harder to find current information.