{"id":827,"date":"2026-04-24T17:14:19","date_gmt":"2026-04-24T17:14:19","guid":{"rendered":"https:\/\/cybersecuritysanity.com\/?p=827"},"modified":"2026-04-24T17:14:20","modified_gmt":"2026-04-24T17:14:20","slug":"preview-of-another-book-chapter","status":"publish","type":"post","link":"https:\/\/cybersecuritysanity.com\/?p=827","title":{"rendered":"Preview of Another Book Chapter"},"content":{"rendered":"\n<p>I have decided to publish a second rough draft of a book chapter.  One of my primary reasons is to get the content out as soon as possible, since my motivation is to help with the extreme misinformation being circulated about AI.<\/p>\n\n\n\n<p>The idiotic &#8220;doom&#8221; movement continues to operate with seeming credibility despite the childish message that superintelligence will take over the world, like some kind of cartoon villain.  Unfortunately, refuting this is hard, because adherents have a completely wrong understanding of how the technology works.<\/p>\n\n\n\n<p>It is clear that to resist this lunacy, along with the equally stupid idea that models might be conscious or have feelings, you simply need to know how this actually works.  That&#8217;s what &#8220;Understanding AI&#8221; is all about.  That&#8217;s why I&#8217;m writing the book.  It&#8217;s a decidedly not dumbed down primer on AI.  It explains the theory, early evolution, why things are done as they are, neural networks, deep learning, natural language processing and generative AI.<\/p>\n\n\n\n<p>This unapologetic adult and broad primer will help make anyone immune to the extreme cultish nonsense that has surround the subject.  It does not go into depth more than it needs to, but it assures all relevant concepts are covered in a computer way that does not insult the readers intelligence.<\/p>\n\n\n\n<p>This is not the second chapter of the book, but rather the 7th.  It is, of course, subject to change, as it is a draft.  But, putting it out there, if nothing else, holds my feet to the fire, to get it done.  I&#8217;m sure it will receive at least three major revisions, but so far I have not seen such a comprehensive account of the dawn of generative AI from anyone inside the industry.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p class=\"has-medium-font-size\"><strong>The Dawn Of the Generative Era<\/strong><\/p>\n\n\n\n<p>If there is one thing that truly stands out about the generative AI revolution, it is how suddenly it seemed to arrive. In late 2022, ChatGPT was released to the public. Shortly after, Google released its own conversational AI systems. Around the same time, text-to-image generators began to appear. To those outside of AI development labs, the transition was startling. It felt as though artificial intelligence had gone from an abstract concept\u2014something used quietly to optimize systems\u2014to a technology capable of engaging in fluid, human-like conversation almost overnight.<\/p>\n\n\n\n<p>What made this shift feel so jarring was not just the capability itself, but how closely it matched long-standing expectations. For decades, science fiction had conditioned people to think of artificial intelligence as something you could talk to\u2014something that could respond, reason, and interact in natural language. Earlier chatbot systems had existed, but they were limited, brittle, and often frustrating to use. What appeared in 2022 was fundamentally different. It was not merely an incremental improvement. It felt like a categorical shift.<\/p>\n\n\n\n<p>Even within the technology sector, the transition felt abrupt. While the foundational ideas behind generative models had been developing for years, they were known only to a relatively small segment of researchers and engineers. Prior to 2022, most software development had little to do with artificial intelligence, and even less with large-scale generative systems. Only a handful of organizations\u2014such as OpenAI, Google, and academic institutions like Stanford University\u2014were working at the forefront of this technology. Even then, the systems being developed were experimental and not widely deployed.<\/p>\n\n\n\n<p><strong>Foundations of Generative Models<\/strong><\/p>\n\n\n\n<p>Computer generated content is not itself new.&nbsp; Algorithms existed that could generate complex reports and documents, based on templates, conditions and rules.&nbsp; Complex and detailed reports could be generated and various content creation systems existed for other material.&nbsp; However, these systems were fundamentally constrained in their ability to generate content.&nbsp; They could only re-combine rules and templates that were already loaded.<\/p>\n\n\n\n<p>In principle, deep learning could produce much more dynamic and varied content.&nbsp; One of the earliest practical examples of generative AI came from a model type known as a Generative Adversarial Network or GAN.&nbsp; The first GANs were introduced in 2014.&nbsp; The concept of a GAN is extremely clever. It consists of two neural networks, a generator and a discriminator.&nbsp; They have adversarial jobs.&nbsp; The generator is trained to output images and the discriminator is trained to detect AI generated images.&nbsp; This dynamic results in a feedback loop of image improvement.&nbsp; The generator is pushed to get better at generating realistic images, while the discriminator gets better and better at telling when the image is AI generated. As a result, artifacts and inaccuracies are reduced and the system optimizes toward realistic image generation.<\/p>\n\n\n\n<p>GANs were the first true form of generative AI that had much use beyond experimentation.&nbsp; At the time, the ability to generate images from descriptions was considered a major achievement. However, GANs were limited in what they could do, compared to more recent generative models.&nbsp; GANs were difficult to train and usually only worked for one type of image generation, such as faces or landscapes.&nbsp; At the time it seemed amazing that entirely synthetic images of people who did not even exist could be generated and look very realistic.&nbsp; GANs have since been replaced with newer, more capable forms of image generation and most newer generative models for images are based on diffusion, a completely different process.<\/p>\n\n\n\n<p><strong>Renewed Interest in AI<\/strong><\/p>\n\n\n\n<p>After the second \u201cAI Winter,\u201d the topic of artificial intelligence began to see renewed interest in the 2010s. Companies like Google, Facebook and others had already seen strong results using machine learning for ranking, recommendations, and system optimization, in the process gaining expertise in the area.&nbsp; GPU computing, big data and cloud infrastructure were opening up new possibilities.<\/p>\n\n\n\n<p>Organizations began investing more heavily in AI as a core capability rather than a supporting tool. Groups such as Google Brain and DeepMind were formed, and OpenAI, initially a research nonprofit, was established with the goal of advancing artificial intelligence more broadly.&nbsp; More breakthroughs came in the 2010s.&nbsp;<\/p>\n\n\n\n<p>Companies like Google, Facebook and others had already seen strong results using machine learning for ranking, recommendations, and system optimization. While these were not generative systems, they proved something critical: machine learning worked at scale and created real value.<\/p>\n\n\n\n<p>This success drove a new wave of ambition. Organizations began investing more heavily in AI as a core capability rather than a supporting tool. Groups such as Google Brain and DeepMind were formed, and OpenAI was established with the goal of advancing artificial intelligence more broadly. What had once been a more academic field began to shift into a major industry focus.<\/p>\n\n\n\n<p>The early success with deep learning, the advent of powerful infrastructure and the involvement of big players all created the modern AI development ecosystem and set the stage for the development of generative AI.<\/p>\n\n\n\n<p><strong>The Challenge of Language Processing<\/strong><\/p>\n\n\n\n<p>The nature of language itself makes it difficult to model in traditional neural networks.&nbsp; Language is sequential, contextual and highly dependent on relationships between elements.&nbsp; A single word can change the meaning of a sentence dramatically, and words may have an impact on parts of a statement, even when they are some distance away in the sequence.&nbsp; For example, knowing what item a pronoun is referring to may require understanding objects described earlier in a statement.&nbsp; The structures and dependency of language is especially difficult to process in a neural network, because these relationships are so varied from one statement to another.<\/p>\n\n\n\n<p>Early neural approaches to language processing relied heavily on recurrent neural networks (RNNs) and later Long Short-Term Memory (LSTM) networks. These architectures were designed to handle sequences by maintaining a form of internal state, allowing them to process input over time.&nbsp; In LSTM, words in statements are processed linearly, with some information intentionally being discarded, based on the probability that it would be relevant to the output.&nbsp; These early language processing networks could preform some useful linguistic tasks.&nbsp; However, training was slow and capabilities were limited.&nbsp; These models only worked well on short, simple sequences of text.<\/p>\n\n\n\n<p>Despite the difficulty in processing language, there has always been a strong incentive to improve language processing in AI.&nbsp; Language is the natural way that humans communicate, and the ability to process it has obvious utility.&nbsp; Language is the most compressed form of cognition, it allows humans to \u201cthink out loud,\u201d expressing abstract concepts and reasoning through real world situations.&nbsp; The ability to understand language is necessary to respond to communications or gather information from others, but it was an entirely human domain task. For example, if a company wanted to get a good idea about how customers were talking about their product online, that\u2019s something that would require dedicated human labor to investigate.<\/p>\n\n\n\n<p>Due to the limits of recurrent neural networks in processing language, a new method of processing was introduced to neural networks known as \u201cattention\u201d in 2014.&nbsp; Attention allows for the comparison of data across a sequence of discrete \u201ctokens\u201d to determine which ones are the most important to the output.&nbsp; Attentional Neural Networks showed the ability to process long sequences of data, such as language, without losing the meaning of long and complex statements in the way earlier neural networks did.&nbsp; The attention mechanism was further refined to better map word relationships and dependencies, improving language processing further.<\/p>\n\n\n\n<p><strong>Attention Is All You Need: The Revolutionary Paper<\/strong><\/p>\n\n\n\n<p>In 2017, the paper Attention is All You Need was published.&nbsp; Although it was not apparently at the time, the ideas introduced by the paper would revolutionize how certain types of data, particularly language, would be processed in neural networks.&nbsp; The architecture proposed, the transformer, is the basis for familiar chat based and generative AI services like Google Gemini, ChatGPT and Anthropic\u2019s Claud.&nbsp; It is what made generative language processing of this type possible.<\/p>\n\n\n\n<p>The paper made a bold proposal: a new architecture, dubbed the transformer.&nbsp; Transformers are neural networks which model sequential data, but they do it without recurrence or other prior techniques.&nbsp; Instead, the transformer only uses the process of attention to compare data across a sequence and detect patterns.&nbsp; Unlike previous models, transformers process the entire sequence in parallel, at once.&nbsp; This kind of processing only became feasible because of the power of modern GPUs.<\/p>\n\n\n\n<p>Transformers were proposed as a solution to the problem of natural language processing.&nbsp; At the time of its publication, it was not focused on two-way chats or on the generation of long complex and original output, as todays chatbots can do.&nbsp; Instead, it was focused on the far more mundane problem of translating text from one language to another.<\/p>\n\n\n\n<p>It quickly became apparent that transformers have a number of advantages over previous architectures.&nbsp; Transformers do not struggle to maintain long term relationships between words, as previous models had.&nbsp; Transformers were discovered to be useful for other types of sequence modeling, such as audio, radio signals, and music scores.<\/p>\n\n\n\n<p>The earliest practical use of the transformer to model language at scale was a system known as BERT, which stands for Bidirectional Encoder Representations from Transformers. BERT was developed by Google in 2018 and represented one of the first major demonstrations that transformer-based architectures could be used effectively for real-world language tasks.<\/p>\n\n\n\n<p>Unlike earlier models, BERT was designed to understand language context in both directions simultaneously. Rather than reading text strictly left-to-right, it could evaluate relationships between words across an entire sentence at once. This allowed it to perform much better on tasks such as search relevance, question answering, and sentence classification. In practical terms, BERT dramatically improved the ability of machines to interpret the meaning of human language.<\/p>\n\n\n\n<p>BERT is still widely used today.&nbsp; It does not generate responses to text, but it can analyze text and determine things like sentiment, urgency, question type.&nbsp; It can classify sentences and statements and perform other types of linguistic analysis.&nbsp; What BERT demonstrated is that even complex and highly subjective concepts can be read from language using pure statistical analysis.<\/p>\n\n\n\n<p><strong>The Transition to Large Language Models and GPTs<\/strong><\/p>\n\n\n\n<p>This question led directly to the development of large language models, or LLMs. While Google had pioneered much of the early transformer work, organizations like OpenAI began to take interest. OpenAI, at the time, was a small development lab and operated as a nonprofit, developing opensource AI products, hence the name.&nbsp; They invested heavily in the concept of transformer-based language modeling.<\/p>\n\n\n\n<p>The generative pretrained transformer was designed to complete sequences of words with the next likely word.&nbsp; This is a relatively simple starting point.&nbsp; Autocomplete is a familiar technology to anyone who has used a mobile phone to text in the past twenty years.&nbsp; It aims only to predict the next word, but because transformers can preform multiple levels of deep analysis on a sequence, it was theorized that next token prediction could be extended to generate language.&nbsp; Earlier attempts had used a similar approach and were successful for relatively short and simple sequences. &nbsp;<\/p>\n\n\n\n<p>How next word prediction can be extended to create whole sentences and later paragraphs or whole stories is surprisingly simple: after the model predicts a word, it is appended to the sequence and the entire model is run again on the entire sequence.&nbsp; It appends the sequence with another word until the model predicts a \u201cstop token\u201d which indicates that the statement is finished.<\/p>\n\n\n\n<p>The first model designed to do this was GPT-1. &nbsp;GPT-1 was released in 2018, by OpenAI with the paper \u201cImproving Language Understanding by Generative Pre-Training.\u201d&nbsp;&nbsp; It was trained to predict the next word based on the BooksCorpus dataset, a dataset of about seven thousand books. With 117 million parameters, GPT-1 was much smaller than current GPT models.&nbsp; GPT-1 was not a practically useful product, but a proof of concept.&nbsp; It demonstrated the ability to paraphrase and the ability to determine the directional relationship between fragments of text.&nbsp; While this may sound simple, at the time, it outperformed all previous methods of text analysis and generation.<\/p>\n\n\n\n<p>The success of GPT-1 as a demonstration lead to the development of GPT-2.&nbsp; GPT-2 scaled the process further, eventually reaching 1.5 billion parameters. &nbsp;This was enormous for the time and pushed the bounds of hardware capabilities. It has been described as a \u201cdirect scale up\u201d of GPT-1.&nbsp; It also increased the diversity of the training set, by adding millions of webpages to the corpus of books GPT-1 used.&nbsp; GPT-2 was released in 2019, and, unlike GPT-1 it could preform useful task and was more than just a demonstration.<\/p>\n\n\n\n<p>GPT-2 could produce paragraphs that were grammatically correct, logically consistent, and stylistically appropriate. It could follow prompts and continue them in ways that made sense to a human reader. It was not perfect, but it crossed a threshold.&nbsp; GPT-2 demonstrated the ability to generate entire essays, news stories and prose.&nbsp; Although the consistency and coherence was poor by modern standards, it was a breakthrough for the time.&nbsp;<\/p>\n\n\n\n<p>The way that users interacted with GPT-2 was much different than current language models.&nbsp; Rather than a two-way chat, with instructions and questions, GPT-2 would continue sequences of text based on the patterns it learned.&nbsp; So, rather than saying \u201cWrite me an essay about World War 2\u201d the user would start with \u201cWorld War 2 was the largest conflict in human history\u2026\u201d and the system would pick up from there, continuing to finish the statement.<\/p>\n\n\n\n<p>Interestingly, experiments with GPT-2 lead to more advanced prompting strategies. One was question and answer prompting.&nbsp; A user would prompt the system with a statement like \u201cQ: When did World War 2 End&nbsp; A: \u201c&nbsp; The system would often frequently pick up on the fact that this was a two way exchange and answer correctly.&nbsp; Other prompting strategies included providing headlines or summaries of content. These types of experiments would lead to the chat interfaces that exit today.<\/p>\n\n\n\n<p>When OpenAI announced the release of GPT-2 to the public, there was great outcry against its release.&nbsp; Concerns were raised that it could be used to generate misinformation, propaganda, fraudulent content. Retrospectively, the concern seems silly.&nbsp; By current standards, GPT-2 was a very primitive tool.&nbsp; The obvious rebuttal is that it was always possible to create propaganda, fraud and misinformation and that the automation of the process doesn\u2019t change much, considering how much misinformation is generated already.&nbsp; However, this would foreshadow the fact that AI capabilities are often framed as dangerous or cause a moral panic.<\/p>\n\n\n\n<p>What was most important about GPT-2 was that it demonstrated that language modeling, at scale, can be used to generate long sequences of complex text, through next word prediction.&nbsp; It proved that scale was an important factor in achieving these results and that output increased with scale. Most importantly was the demonstration that, with sufficient analysis and data, next word prediction can capture long and complex narrative coherence. Outputs could even produce structured logic reasoning and narrative completion.<\/p>\n\n\n\n<p>What is happening under the surface, however, looks nothing like human cognition, even if the output appears to be structured reasoning. Instead, it is pattern recognition and recreation, building a response based on the patterns of millions or billions of similar statements that the model has been trained on.&nbsp; From that, linguistic structure emerges and words are chosen based on whether they fit the patterns established for how explanations unfold, how arguments are made, how essays flow and how people converse with each other.<\/p>\n\n\n\n<p>One of the most remarkable discoveries was the ability of the model to patterns in ways that were not explicitly present in its training data. It could draw analogies, follow logical structures, and synthesize ideas. In these cases, the model was capable of taking the logic or structure of a statement and applying it to a different set of circumstances or to replace a word with one that is analogous. This led to early observations of what would later be described as latent reasoning or emergent behavior.<\/p>\n\n\n\n<p>The most important conclusion of the development of GPT-2 was that capabilities and quality of output improve with model scale.&nbsp; That includes not only the size of the model itself, the amount of computing power applied to its optimization, and the size of the training set of data.<\/p>\n\n\n\n<p><strong>GPT3: Scaling and Instruction Tuning<\/strong><\/p>\n\n\n\n<p>This progression led to GPT-3, a significantly larger and more capable model that demonstrated a major leap in performance. The success of GPT-2 resulted in vastly more resources being invested in GPT-3.&nbsp; The training of the model scaled the process to the maximum level feasible, at the time.&nbsp; Rather than being trained on a portion of the internet, GPT-3 was trained on scrapings of nearly the entire pubic internet, as well as millions of news articles, conversations, transcripts and books.<\/p>\n\n\n\n<p>GPT-3 was able to perform a wide variety of tasks with minimal prompting, including writing, summarization, coding, and question answering.&nbsp; However, in its original form, the model was still primarily a text completion engine.&nbsp; It was very good at continuing text, but did not behave like an instruction following or conversational system.<\/p>\n\n\n\n<p>To address this, a process known as instruction tuning was introduced. Models were trained not just on raw text, but on structured examples of prompts and responses. The process works by first pre-training the model on a huge corpus of data and then fine tuning the model to structure its output in the form of responses to prompts.<\/p>\n\n\n\n<p>To do this, OpenAI had fed their model millions of examples of questions and answers, greetings and responses, instructions and responses.&nbsp; The model was shown numerous examples of instructions requesting essays be generated, lists be alphabetized, conversations be continued and statements be evaluated.&nbsp; Drawing on its previous training, with examples of almost every type of text imaginable, the model learned to respond to users requests and even maintain coherent two way conversations.<\/p>\n\n\n\n<p><strong>The Release of GPT-3 as ChatGPT<\/strong><\/p>\n\n\n\n<p>The moment that really changed the world was the release of ChatGPT.&nbsp; The technology of generative pretrained transformers had been demonstrated by OpenAI, but the capabilities were so remarkable and transformative that it became clear that the public and enterprise would need to test drive the technology to fully appreciate what it could do.&nbsp; OpenAI made the bold move of introducing a public facing product with ChatGPT in late 2022.<\/p>\n\n\n\n<p>ChatGPT offered a simple, intuitive and approachable way to interface with the model.&nbsp; It was truly amazing, because anyone could use it and immediately sense that this was a new capability.&nbsp; It\u2019s important not to underestimate the impact that years of science fiction setting the expectation that machine conversation was a hallmark of the future, but a distant dream in reality.&nbsp; It\u2019s not surprising that this caused an immediate frenzy of excitement.&nbsp; So much so, that early in its release, OpenAI struggled to procure computing capacity fast enough to meet demand.<\/p>\n\n\n\n<p>One company that took notice immediately was Google.&nbsp; Although large language models and the power of natural language processing were poorly understood in the general public, Google was one of the few organizations familiar with the technology and its capabilities.&nbsp; Google had developed their own large language models, some of comparable capability, but had not released them publicly.<\/p>\n\n\n\n<p>For Google, the Large Language Model and Natural Language Processing had previously been a misfit technology.&nbsp; The power of the models was clear, but their application in the Google ecosystem was less clear.&nbsp; Generative models could produce complex outputs, but their probabilistic nature made them problematic for Google, a company that consumers trust for accurate information.&nbsp; They also can be problematic with Google\u2019s revenue strategy.&nbsp; Google makes most of their money from searches, and advertising is key to that.&nbsp; A search can lead to multiple pages of results and many websites clicked, but a language model can present a lot of information in a concise statement, with less ad revenue opportunity.<\/p>\n\n\n\n<p>Recognizing the capabilities of large language models, Google\u2019s CEO, Sundar Pichai issued a \u201ccode red\u201d e-mail to Google employees, expressing his fear that Google could lose the lead in AI development, due to OpenAI\u2019s model and its public availability. Although not obvious, the reason for such concern at Google extended beyond AI capabilities in general.&nbsp; It was a greater concern that a move to natural language-based information retrieval as a threat to Google\u2019s core business as a search engine, and as the default landing page for anyone looking for information on the internet.&nbsp; For this reason, he ordered Google Labs to bring the technology to market as soon as possible.<\/p>\n\n\n\n<p>It\u2019s easy to ignore the fact that this is quintessentially opposed to how Google normally does business. Google has rarely been the first to market, with major products.&nbsp; Google was not the first search engine, Gmail was not the first web-based email, Google Maps was not the first map service.&nbsp; Rather, Google has staked its reputation on well-engineered products that deliver value and consistency.&nbsp; The Google way has been to be the best, but not always the first.&nbsp; The fact that they made this move should illustrate how revolutionary they saw the technology was being.<\/p>\n\n\n\n<p>It\u2019s also very telling that Google\u2019s initial release of a natural language processing product came, not in the form of a chatbot, but packaged as more of a search assistant, capitalizing on their already strong presence.&nbsp; However, there\u2019s really no hiding that Google\u2019s first generative products were rushed to production.&nbsp; That\u2019s why early versions were so prone to hallucination.&nbsp; This includes the notorious demonstration where Sundar Pichai demonstrated Google\u2019s new AI, only to have it hallucinate the response, resulting in a 200 billion dollar stock selloff.&nbsp; Thankfully, the stock price did recover.<\/p>\n\n\n\n<p>Google\u2019s AI offering started off very rough around the edges, but it was rapidly improved, even as the branding moved from \u201cBard\u201d to \u201cGemini.\u201d&nbsp; The tool also became far more integrated, gaining the capability to interface across Google\u2019s ecosystem.&nbsp; ChatGPT also improved as GPT-3 was upgraded to GPT 3.5 and then GPT 4.&nbsp; Each version introduced greater capabilities than the previous.<\/p>\n\n\n\n<p>It did not take long for the field to widen.&nbsp; In 2023, Anthropic released Claud to the market, establishing what would become the \u201cBig Three\u201d of large language models and generative AI.&nbsp; They would be followed by smaller competitors and a growing open-source effort.&nbsp; There would also be an explosion of companies producing generative AI products, such as stability.ai, which focuses on image generation.&nbsp;<\/p>\n\n\n\n<p>Today there are a large number of open-source large language models, as well as research models and specialized models.&nbsp; A number of organizations, including Oracle, Nvidia, Microsoft and even financial services companies have begun efforts to develop their own, in-house large language models, from the ground up.&nbsp; Others have leveraged open source models, and still more have built products which license the technology from OpenAI, Google or Anthropic.<\/p>\n\n\n\n<p><strong>Multimodality, Images, Video and Sound<\/strong><\/p>\n\n\n\n<p>Natural language processing has been the area of generative AI that has received the most attention.&nbsp; It\u2019s not hard to understand why.&nbsp; Language can be used to accomplish a vast number of tasks and it makes human interface especially easy. &nbsp;However, the generative revolution extends far beyond text.<\/p>\n\n\n\n<p>Generative Adversarial Networks were the first image generators. They were remarkable for their time, but usually only worked over a small domain.&nbsp; They have been largely replaced by diffusion models.&nbsp; Diffusion models work through a counter-intuitive mechanism.&nbsp; They start by creating an image of pure random noise, they then use a model trained to remove the noise to reveal image.&nbsp; Although there is no image in the noise, the model must provide an output, so it constructs one.<\/p>\n\n\n\n<p>As with language models, diffusion model\u2019s capabilities increase vastly with scale. It\u2019s important that the models be well labeled, so that the model can associate image features with the words used to describe them.&nbsp; As with language models, current training sets are unimaginably vast, enabled by the sheer volume of images on the internet. &nbsp;This enabled them to produce remarkably detailed original images, synthesizing features to create fictional scenarios like a fuzzy orange elephant, floating in space.<\/p>\n\n\n\n<p>However, diffusion alone has limitations. It excels at generating open-ended imagery, such as \u201ca beautiful landscape\u201d or \u201can astronaut on the moon,\u201d but struggles with precise spatial arrangements or structured compositions. As a result, many modern image generation systems use hybrid approaches, combining diffusion with other models or multi-stage pipelines.&nbsp;<\/p>\n\n\n\n<p>Image models have improved vastly, and today can include accurate text in images and create images from not only text prompts but from reference images, such as photos or sketches.&nbsp; Many options exist including Midjourney, Dall-E and offerings from Cavana and Adobe.&nbsp; Moving beyond pure diffusion has allowed image generators to create complex charts, graphs and technical illustrations.<\/p>\n\n\n\n<p>Stable Diffusion is an excellent image generator for anyone looking to learn about diffusion or image generation in general.&nbsp; Unlike most tools, which are designed for simplicity, Stable Diffusion lets users adjust variables, such as noise levels, iterations and other low-level variables.&nbsp; The learning curve is much greater than other generators, but for professionals and those looking to create advanced outputs, it offers unparalleled power.&nbsp; It can be used to modify, scale or denoise images or generate new ones.<\/p>\n\n\n\n<p>Similar techniques have been extended to video and audio. Models exist that can produce videos based on text prompts, images or sample videos.&nbsp; The capabilities are really remarkable, considering the accessibility and ease of use. Although video takes orders of magnitude more processing power than text, the vase collections of online videos and advanced processing techniques have made the training feasible. Today a simple text prompt can allow a user to generate a video with effects that would have been a challenge for the best Hollywood studios a generation ago.<\/p>\n\n\n\n<p>Similar transformations can be made to audio.&nbsp; Like video, it\u2019s possible to create fully original audio samples that represent things that would be almost impossible to generate otherwise. For example, creating a duet between Elvis Presley and Weird Al Yankovic is trivially easy and can be done by anyone with access to the internet.&nbsp; Of course, there are also a huge number of practical uses for this technology.<\/p>\n\n\n\n<p>What is clear is that, with sufficient training data, it\u2019s possible to create a generative model for almost any kind of information content desired.&nbsp; Blueprints, diagrams, computer code, audio, music scores, images, movies and even three-dimensional worlds can all be created by models trained on available data and prompted by descriptions of the desired content.<\/p>\n\n\n\n<p>Newer models are also multi-model.&nbsp; These models do not work in just one medium, but can interface and translate across media types.&nbsp; Most new large language models now can take things like images and files as inputs and analyze them.&nbsp; They can also generate images and sounds. These capabilities vastly expand the capabilities of models and allow for the verbal discussion of image data or descriptions to be generated about video.&nbsp; It\u2019s a huge increase in capability.<\/p>\n\n\n\n<p><strong>Reasoning Models:<\/strong><\/p>\n\n\n\n<p>Large language models are pattern completers.&nbsp; They can give answers to questions, generate reports and essays and transform text.&nbsp; However, they are not capable of true cognitive reasoning.&nbsp; This limits their ability to solve certain types of problems, especially those that require multiple steps to break down. &nbsp;From a human perspective, this makes perfect sense. Some problems can\u2019t be solved intuitively, but require structured breakdown.<\/p>\n\n\n\n<p>Since large language models excel at following directions and can use words to construct logical statements, an obvious way to improve the way models can solve problems was to have them \u201ctalk it out.\u201d&nbsp; This exploits a unique property of language: language itself can be used as a cognitive framework.&nbsp; We are all familiar with the concept of \u201cthinking out loud\u201d or solving a problem through ones own internal dialog.&nbsp; Therefore, it made sense to try this technique in language models.<\/p>\n\n\n\n<p>This became known as \u201cchain of thought\u201d reasoning.&nbsp; Similar techniques would follow, modeling the mental problem-solving humans use.&nbsp; \u201cTree of thought\u201d seeks multiple solutions. Other techniques include multiple rounds of self-criticism, exploring multiple possible outcomes and so on.&nbsp; Models can be prompted to give multiple solutions, evaluate their potential, list disadvantages and reevaluate.&nbsp; This method of \u201ctalking out\u201d a problem is powerful.&nbsp; It\u2019s also a very real form of subjective reasoning that is now possible with computers, although it should be noted that it remains a form of pattern recognition and is not true awareness.<\/p>\n\n\n\n<p>Thought prompting is a powerful tool, but it should be noted that models are imperfect and lack the common sense of a human.&nbsp; Despite the fact that verbal reasoning and simulated verbal cognition can work very well, they are also prone to errors in ways humans are not.&nbsp; Large language models have difficulty detecting errors in their own output and will often expound on ridiculous assumptions or make a basic reasoning error.<\/p>\n\n\n\n<p>Newer models have been optimized for reasoning.&nbsp; Popular services like ChatGPT and Claud now use a basic internal verbal reasoning mechanism to pre-reason over some complex tasks, rather than simply give the raw model output.&nbsp; This is slightly slower, but it allows far more dynamic output on tasks requiring logic or with nuance.<\/p>\n\n\n\n<p>There have also been efforts to make models reason internally, known as latent space reasoning. The limits of large language models to solve novel tasks outside the domain of language has been of great interest.&nbsp; New models, such as hierarchical reasoning models and recursive reasoning models are designed to do just that and are being incorporated into language and non-language processing.<\/p>\n\n\n\n<p><strong>Agentic AI:<\/strong><\/p>\n\n\n\n<p>Given the power of natural language processing to work through complex narratives and answer questions, a natural question was to what degree these models could operate autonomously.&nbsp; In the modern world, most administrative tasks can be accomplished with language alone and IT system respond to commands in the form of text.&nbsp; A natural question was whether a large language model could self-prompt its way to task accomplishment.<\/p>\n\n\n\n<p>Early examples included having multiple language model instances prompting each other to accomplish tasks or running a simple loop, repeatedly prompting the model to evaluate a situation and act accordingly.&nbsp; Scheduled prompts could allow large language models to accomplish basic tasks at a pre-determined time, such as summarizing e-mails or reporting on news.<\/p>\n\n\n\n<p>Early experiments lead to implementations of simple agentic frameworks.&nbsp; The first were basic and experimental.&nbsp; They used the large language model as a function, calling it automatically and prompting it to take the next step in executing a task or navigating a problem.&nbsp; Increasingly complex systems could respond to data and the environment.<\/p>\n\n\n\n<p>This led to fully agentic frameworks.&nbsp; Agentic AI is now a major area of the generative AI ecosystem.&nbsp; A number of agentic platforms exist. One of the most well known is Copilot, Microsoft\u2019s tool for automation and content generation in their ecosystem.&nbsp; Agentic platforms and frameworks take many forms and are being refined continuously.&nbsp; One item of confusion is that some vendors call any customized text generation an \u201cagent.\u201d&nbsp; Really, the term implies autonomous operation.<\/p>\n\n\n\n<p>Agents are increasing in capability, but continue to suffer from some reliability issues.&nbsp; For example, the platform Openclaw lets users build agents that can interface with numerous systems, such as e-mail, local files and online accounts.&nbsp; While this makes the platform very capable and powerful, it has also resulted in errors like files being deleted or unauthorized transactions.&nbsp; This highlights a fundamental problem with current agentic frameworks: LLMs are probabilistic and language is ambiguous.<\/p>\n\n\n\n<p>Still, agentic AI continues to grow in capability and importance. Newer frameworks offer better guardrails and task execution methods.&nbsp; Better guardrails are a vital addition. However, progress is rapid in the field.<\/p>\n\n\n\n<p>One thing about AI agents is that creating new and customized agents is extremely easy and user friendly.&nbsp; There are agentic frameworks that allow for the creation of advanced agents with coded and engineered features, but many frameworks also allow end users to create their own agents by simply describing the tasks they want accomplished in plant language.<\/p>\n\n\n\n<p><strong>RAG and Tool Calling<\/strong><\/p>\n\n\n\n<p>Large language models are good at some things and not others.&nbsp; For example, despite excellence in recreating verbal conversations, large language models are terrible at even basic mathematics. That\u2019s because asthmatic is not a linguistic task and requires that the proper operations be performed, not just simulated through pattern matching.&nbsp; For this reason, early models often output the wrong answer to basic math models.&nbsp; This lead to the obvious conclusion of \u201cteach the model to use a calculator.\u201d<\/p>\n\n\n\n<p>Since then, this idea has been vastly expanded on.&nbsp; Models can now use basic tools like calculators, scratch pads, code execution environments and other engines to improve their accuracy and do things that language models alone are poor at.&nbsp; The capability is always improving, and new tools are always being integrated.<\/p>\n\n\n\n<p>One of the earliest uses of external data is RAG, retrieval augmented generation.&nbsp; This is an extension of the model\u2019s ability to reformat and paraphrase text it is provided with.&nbsp; By allowing the model to connect to external data sources, including searching the web, it can process information that is not in the training data. This allows for it to process things like recent news or external data from live sources.<\/p>\n\n\n\n<p>RAG is also vital to assuring accuracy.&nbsp; Large language models are trained to create plausible pattern-based outputs.&nbsp; As a result, they can sometimes confabulate facts, known as a \u201challucination.\u201d&nbsp; This is a major risk in high stakes environments.&nbsp; RAG grounds a model in a known reference.&nbsp; It reduces, although does not eliminate, the risk of hallucinations and helps assure the model is pulling from trusted sources.<\/p>\n\n\n\n<p>What makes RAG so powerful is that it allows highly customized behavior from LLMS without fully customizing the model itself.&nbsp; Models can be provided with complex instructions, scripts and background data.&nbsp; This enables the creation of highly specialized interfaces and chatbots, without having to rewrite the entire model.<\/p>\n\n\n\n<p>Model Context Protocol or MCP is a powerful new tool for rag and tool integration. It is a universal protocol for models to call external tools, data sources and services.&nbsp; MCP replaces complex API calls with an industry standard, making it easy to connect models to a variety of external services.<\/p>\n\n\n\n<p>MCP is just one example of how natural language processing is moving from being a generative technology to an interface layer. Because text is the universal language of command execution, it\u2019s possible to use natural language processing as the user interface to technology. A person can talk to the system and it can execute their commands.&nbsp; The implications are as revolutionary as GUI taking over from command style input.<\/p>\n\n\n\n<p><strong>Coding Models<\/strong><\/p>\n\n\n\n<p>One thing that models excel at is the generation of programmatic code.&nbsp; As far as tasks go, it\u2019s hard to think of one better suited to language modeling.&nbsp; Programmatic code follows strict rules and patterns, and there are vast collections of open-source implementations and definitive standards guides to train on. Computer programs reuse the same logic and structures.&nbsp; Even across different programing languages, the logic remains the same.<\/p>\n\n\n\n<p>Although natural language is far more ambiguous and non-deterministic, it also confers instructions and can be used to describe tasks.&nbsp; Programing a computer is literally a linguistic task.&nbsp; In many cases, the basic process of what a program does is simple, straight forward, and can be expressed verbally.&nbsp; Translating it to code is the hard part for most people.&nbsp; It requires proficiency in the programing language.<\/p>\n\n\n\n<p>Even for experienced programmers, creating a large new program, even a simple one, can be a grind. Programming is exceptionally sensitive to things like typographical errors or misnaming variables, and programs can get long and complex.&nbsp; Even simple programs can take a long time to write and debug.&nbsp; For this reason, programmers have always relied on templates, libraries and code examples.<\/p>\n\n\n\n<p>Language models are a natural extension of this.&nbsp; If a person can describe what they want a program to do, or how they want code modified, a model can absolutely produce the code.&nbsp; This has truly revolutionized the field of computer programing and web development.<\/p>\n\n\n\n<p>Newer models have leaned hard on this ability.&nbsp; Claud Code and others are heavily optimized for the task and trained on vast quantities of code.&nbsp; Newer models now operate in coding environments, which allow file creation, iterative development, sandbox execution and branched development.&nbsp; It\u2019s a vastly more capable way of generating code than in a chat window.&nbsp; Coding models are increasingly being integrated into complex development environments.<\/p>\n\n\n\n<p>It should be noted that this is not without risk.&nbsp; Models can and do create errors in code, as they can hallucinate elsewhere.&nbsp; They can also pick up bad coding habits from examples of sloppy work in open-source code repositories.&nbsp; \u201cVibe Coding\u201d is the act of coding by just describing what one wants.&nbsp; It can be a problem for high-risk work flows, where individuals who don\u2019t know how to properly inspect the code for errors are deploying it.&nbsp; The problem with program code, is that while errors may seem obvious (the code either runs properly or it doesn\u2019t) this is often not the case.&nbsp; Major problems like security vulnerabilities may not show up in testing.<\/p>\n\n\n\n<p>Coding environments are improving.&nbsp; Models which self-enforce high standards for coding, sandbox execution and automated inspection can reduce these risks.&nbsp; Still, a large number of vibe coding errors have shown up in the wild, illustrating the need to improve controls in the overall workflow pipeline.<\/p>\n\n\n\n<p><strong>Wrappers and API Products<\/strong><\/p>\n\n\n\n<p>As mentioned earlier, foundation models, such as GPT, Claud and offerings by Google can be incorporated into other products.&nbsp; These products act as \u201cwrappers\u201d for frontier models.&nbsp; They can connect by API, or application program interface.&nbsp; Frontier labs offer this for a small fee, and any developer can gain access. Frontier models like GPT offer capabilities that are hard to find elsewhere, and the price of access is small, compared to running a large open-source model on a cloud GPU.<\/p>\n\n\n\n<p>Many of natural language processing products work this way.&nbsp; It\u2019s the easiest, lowest friction way for developers to gain access to the technology.&nbsp; This has enabled a vast number of special variations, ranging from fictional character creators, legal assistants, text parsing services.&nbsp; These AI products are frequently a custom instructed version of GPT operating behind the scenes, with the output piped to the programs interface.<\/p>\n\n\n\n<p>This has also allowed organizations to customize and rebrand chat interfaces for all manner of purposes. The ability to drop natural language processing, as a component into almost any product or workflow has really changed what can be done with automation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have decided to publish a second rough draft of a book chapter. One of my primary reasons is to get the content out as soon as possible, since my motivation is to help with the extreme misinformation being circulated &hellip; <a href=\"https:\/\/cybersecuritysanity.com\/?p=827\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[4,150,2],"tags":[116,126,181,146,185,203,138,20,37],"class_list":["post-827","post","type-post","status-publish","format-standard","hentry","category-ai","category-announcements","category-risk-management","tag-ai","tag-artificial-intelligence","tag-book","tag-doom","tag-education","tag-engagement","tag-ml","tag-risk","tag-risk-management"],"aioseo_notices":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=\/wp\/v2\/posts\/827"}],"collection":[{"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=827"}],"version-history":[{"count":1,"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=\/wp\/v2\/posts\/827\/revisions"}],"predecessor-version":[{"id":828,"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=\/wp\/v2\/posts\/827\/revisions\/828"}],"wp:attachment":[{"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=827"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=827"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecuritysanity.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=827"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}