8 min read

How Memory Works

How Memory Works

Have you ever sat through a documentary that you thought was really interesting, and then found yourself struggling to explain what you learned after the fact? Or think back to when you last started a new job. There was almost certainly a time in those first few days of orientation when you were exhausted, and just couldn't absorb any more information. Neither of those outcomes is a "you" problem. This is a well-established challenge related to how our brains remember (or don't remember!) information. The good news: If we can better understand how memory works, we can make smarter instructional design decisions that actually help information stick.

In 1968, psychologists Richard Atkinson and Richard Shiffrin proposed what is now called the multi-store model of memory (also known as the Atkinson–Shiffrin model). It describes memory not as a single location where information lives, but as a flow through three distinct stages: sensory memory, short-term memory, and long-term memory. Each form of memory has different characteristics: how much information is held, how long the information is held for, and what happens to information that's deemed irrelevant. Over the years, further research has refined the details within this model (most importantly, our understanding of short-term memory), but the core ideas hold up well and remain foundational to how we think about learning and instruction.

Three types of memory

Sensory memory is the first memory "bucket" that manages new incoming information. Every sight, sound, taste, smell, and touch is briefly held in a separate temporary bucket for each sense. To demonstrate, as you're sitting right now, you're probably not consciously thinking about how your chair feels, or about the whirring sound of the fan beside you, or of the headlights of the car passing by outside. Your sensory memory is most likely filtering all this distracting information out. The sensory memory has a huge capacity, but can only hold on to information, and it only holds on to those little bits of information for a really short period of time. If a learner's attention doesn't focus on (i.e., attend to) a piece of information at this stage, that information is quickly discarded.

Short-term memory is next up. It receives what attention selects and passes on from sensory memory. This stage is a multi-component working memory system. Rather than a single container, working memory has three components:

  • a phonological loop that handles verbal and auditory information (note: "phonological" refers to the sounds of language)
  • a visuospatial sketchpad that handles visual and spatial information, and
  • a central executive that coordinates information from the other two.

This matters because it means learners aren't processing verbal and visual information through the same channel. They have some capacity to handle both types of information simultaneously, if the material is designed with that in mind, but there's going to be some information loss. We'll cover this idea in more detail in the next post.

What working memory cannot do is hold a lot of information at once. Its capacity is severely limited, and information held here without actively thinking about it (i.e., rehearsal) fades within seconds. Working memory can easily become overwhelmed with too much information, which we'll discuss in detail soon. Rehearsal — which includes actively thinking about information, repeating it, recalling it, or applying it to new situations — improves our ability to store and retrieve information from long-term memory. So, the longer a learner thinks about new information at the short-term memory stage, the more likely they'll remember it in the future.

Long-term memory stores information long term. As far as we know, it has unlimited capacity, and information can be stored indefinitely, as long as we occasionally access it. Information stored here is indexed by meaning, so it's connected to what we already know, rather than by its unique characteristics. For example, a nurse doesn't store the word 'tachycardia' based on the way it sounds. Instead, she stores it as part of a broader network of meaning: elevated heart rate, possible causes, clinical significance, and what to do about it. Atkinson and Shiffrin proposed that rehearsal is the primary mechanism for transferring information from short-term storage into long-term memory, but also noted that connecting new material to existing knowledge (via schemas) in long-term memory produces stronger encoding than repetition alone.

Where schemas come in

As you'll recall from my last post, schemas are the organized knowledge structures we store in long-term memory. A well-developed schema in long-term memory guides what sensory memory pays attention to. Schemas provide organized frameworks which increase the capacity of working memory via chunking (i.e., grouping related items into a single meaningful unit). And having that schema in place accelerates how quickly new, related information gets stored in long-term memory because there is already a label in place to easily "stick" the new information to.

To illustrate: an experienced nurse who is entering a room to help with a code blue resuscitation isn't processing each individual alarm, each medical order, and each visual cue as separate items competing for space in working memory. She has a well-developed resuscitation schema that chunks all the elements of that environment into recognizable patterns, freeing her working memory to focus on unusual symptoms or urgent tasks. In contrast, a novice nurse wouldn't have established that schema yet, so a lot of sensory information is given attention and competes for the same limited workspace in the working memory. The novice nurse will get overwhelmed quite quickly and might not be able to engage in higher level thinking.

This highlights why it's so important we build accurate, well-organized schemas in learners' long-term memory. It's not just a best practice in order to ensure effective recall of information. It can directly impact a learner's performance in an intense situation where quick thinking is required.

Cognitive load theory

John Sweller developed cognitive load theory (CLT) in the 1980s, building on what was then understood about the limits of working memory (Sweller, 1988). Cognitive load theory proposes that the total demand placed on working memory at any given moment comes from two sources.

Intrinsic load is the inherent complexity of the material itself. More complex information has a higher intrinsic load. For example, explaining ventricular septal defects carries a higher intrinsic load for a first-year nursing student than it would for a pediatric cardiologist. You can't eliminate intrinsic load, but you can manage it by sequencing content carefully and building foundational schemas before introducing more complex topics.

Extraneous load is the demand created by how material is presented rather than what it contains. Poorly organized slides, redundant text, irrelevant images, and noisy environments all add extraneous load without adding any learning value. Young et al. (2014) note in their guide on cognitive load theory and medical education that extraneous load is the primary target for instructional design improvements, and that even small reductions in extraneous load can meaningfully improve knowledge transfer.

A fictional nasogastric tube insertion policy, one with tons of information that may not be relevant to a new employee learning the procedure for the first time. There is also no visual context provided, forcing the learner have to imagine what this text describes. Generally, I would recommended not using policies or procedure documents that look like this when initially teaching a clinical skill.

Now, you may have encountered older descriptions of cognitive load theory that include a third type, germane load, defined as the cognitive effort dedicated to building schemas. Sweller himself has since reconceptualized germane load as a function of whatever working memory capacity remains after intrinsic and extraneous load are accounted for, rather than an independent source of load (Orru & Longo, 2019). In other words, free (or leftover) working memory capacity and germane load are effectively the same thing, and we don't need to have germane load as a separate concept. This means the goal of good instructional design is straightforward: keep intrinsic load manageable and strip out extraneous load wherever you can.

That "new hire" feeling from your orientation should make more sense now. In most cases, your working memory wasn't failing you, you were in cognitive overload due to unfamiliar terminology, information-dense slides, and overall too much new content at once. That left little capacity for the kind of deep processing that moves information into long-term memory.

Takeaways

The multi-store model and cognitive load theory together offer a clear set of implications for educators.

  • Audit your sessions for extraneous load. Go through your training materials and ask what is adding cognitive demand without adding learning value. Redundant text on slides, unexplained jargon, and information presented faster than it can be processed are all targets for removal.
  • Sequence content to build schemas, not just cover material. Introduce foundational concepts before complex ones. Learners who have a schema to attach new information to will encode it faster and retain it longer than learners who are encountering everything at once.
  • Use both verbal and visual channels deliberately. Because working memory has separate subsystems for verbal and visual information, presenting complementary information in both formats in serial (rather than duplicating the same content across both formats at the same time) increases effective capacity without overloading either channel.
  • Space rehearsal over time. A single workshop asks working memory to do too much heavy lifting in too little time. Breaking content up across shorter sessions with retrieval practice in between gives long-term memory a chance to consolidate between exposures (Vagha et al., 2025). If you can spread orientation across several weeks and mix in real practice on the floor, learners will retain considerably more than if everything is front-loaded at the start.
  • Treat novice and expert learners differently. An experienced learner with existing schemas can handle more complexity. Experienced learners also don't benefit from long explanations on things they already know - this will just adds extraneous load for them. A novice, on the other hand, needs very simple the instructional content so their working memory isn't oversaturated.

Each of these takeaways is grounded in research on multimedia learning theory, which we'll explore in the next post.


References

Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). Academic Press.

Orru, G., & Longo, L. (2019). The evolution of cognitive load theory and the measurement of its intrinsic, extraneous and germane loads: A review. In L. Longo & M. C. Leva (Eds.), Human mental workload: Models and applications (Communications in Computer and Information Science, Vol. 1012, pp. 23–48). Springer. https://doi.org/10.1007/978-3-030-14273-5_3

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4

Vagha, K., Choudhari, S. G., Taksande, A., Tembhurne, S., Vagha, J., & Vagha, S. (2025). Implementation of a spaced-repetition approach to enhance undergraduate learning and engagement in paediatrics. Frontiers in Medicine, 12, 1601614. https://doi.org/10.3389/fmed.2025.1601614

Young, J. Q., Van Merrienboer, J. J. G., Durning, S., & Ten Cate, O. (2014). Cognitive load theory: Implications for medical education: AMEE Guide No. 86. Medical Teacher, 36(5), 371–384. https://doi.org/10.3109/0142159X.2014.889290