Chapter 5 Operant Conditioning: Learning the Outcome of Behaviors

Modified: 2024-09-25 10:03 PM CDST

Behavioral Processes

OPERANT CONDITIONING
- Thorndike's Law of Effect is the first statement about reinforcement.
- Notice is is not really a behavioral definition
- His major contributions however were his laws of learning, the most important of which was his law of effect (Thorndike, 1911, p. 244):
  - "Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond."
  - Formula for operant is:
    - S^D-->R-->S^R
    - S^D is a DISCRIMINATIVE STIMULUS
      - it "sets the occasion" or informs the organism that a response will be followed by a S^R or reinforcer
    - S^R is the reinforcer (see more info below)
    - R is the response
Pigeon operant chamber (both: "Skinner boxes")
- Discriminative Stimuli
  - S^D-stimulus signaling opportunity for reinforcement, pigeon and green light
    - "Want a back rub?"
  - S^Δ -stimulus signaling the lack of opportunity for reinforcement, frown
    - "I have such a headache" (S^Δ is used to indicate that reinforcement will not follow.)
  - Discriminative stimuli are the first step in a reinforcement situation
    - S^D -> R -> S^R
  - Or, discriminative stimulus, followed by response, followed by reinforcement
  - Example: Red or Green traffic lights (another view of this by famous comic)
    - George Carlin and his brother at intersections (video)
- Shaping
  - Shaping is reinforcing successive approximatins of a final desired response.
  - In the video below notice how Skinner gradually gets the pigeon to execute a full turn (the final desired response) by first reinforcing left turns, later only reinforcing half turns.
  - The light tells the pigeon that reinforcement has been delivered.
  - Notice how quickly the pigeon makes the full turn.
  - Skinner could have just waited a long time for the pigeon to make a full turn and then reinforced it, but shaping makes that wait unecessary.
  - Shaping, then, is a technique for getting to a final desired response more quickly.
  - See Skinner shaping a pigeon in class (below)
    - He rewards the pigeon with food for making more and more of a circle by turning to the left
  - backing professor into a corner (not on test, but humorous)
- Schedules of Reinforcement
  - Skinner analyzed the effect of how reinforcers are delivered
  - He found that the scheduling of reinforcent made a big difference
  - There are many reinforcement schedules but the main ones are:
    - Fixed Interval
      - get a reinforcer after a fixed period (e.g., 10 seconds, one day, one week) of time provided response is made
      - paycheck is an example
    - Variable Interval
      - get a reinforcer after a random period provide a response is made
      - example: working for shady character who pays you on an irregular basis
    - Fixed Ratio
      - get a reinforcer after a fixed number of responses (e.g., 10, 50, 200)
      - piecework is an example (e.g., sewing garments, get paid for each one made)
    - Variable Ratio
      - get a reinforcer after a variable and random number of responses
      - gambling is good example, especially slot machines
    - Examples
      - The steeper the curve the stronger the response
- One of the major discoveries of operant conditioning was that not only do reinforcers have the power to cause responses to be made more often, but that how and when those reinforcers are delivered also affects the pattern of responses.
  - Controlling the how and when of reinforcement is a reinforcement schedule.
- Schedules are of two main types, time-based and response-based.
  - Time-based schedules usually contain the word interval, as in time interval.
  - Response-based schedules usually contain the word ratio, referring to the ratio of responses over time.
- Fixed interval (FI) schedules reinforce any response made after a given interval measured from the preceding reiforcement is reinforced.
  - A given interval is indicated by the addition of a number to the letters FI (seconds, usually).
  - Thus, in FI 15 the first response which occurs fifteen seconds or more after the preceding reinforcement is reinforced.
- Variable interval (VI) schedules reinforce any response made after a variable amount of time.
  - A VI 20 would reinforce after an average of 20 seconds, not every 20 seconds.
- Fixed ratio (FR) schedules deliver a reinforcer based upon a constant number of responses.
  - For example, a FR-10 schedule would deliver a reinforcer every 10th response.
- Variable ratio (VR) schedules are similar to fixed ratio, except that the number of responses required for a reinforcer changes each time.
  - So, a VR-15 schedule would deliver a reinforcer over an average of 15 responses, not on every 15th response.
- Let's examine some everyday examples of reinforcement schedules and their effects.
  - A paycheck is a good example of an FI schedule. Workers get a check once a week, for example, if they show up and work. They do not get rewarded for working harder, or penalized for working less.
- Workers who work by the piece or by the job, piecework, are paid more if they produce more, and are paid less if they produce less.
  - Piecework is an example of an FR schedule.
  - Workers typically work harder on FR schedules than they do on FI schedules.
- Gambling is the classic example of a VR schedule.
  - Part of the allure of gambling is its uncertain payoff.
  - Imagine a slot machine that paid off every 10th time; only the 10th pull would be exciting.
  - A real slot machine, on the other hand, pays off on a random basis, so each pull is exciting.
  - VR schedules maintain behavior at very high rates.
  - Gambling is also addictive, see below.
- About the best everyday example of a VI schedule that I can think of is working for a shady character.
  - This person pays you, but you never know when payday is going to be.
  - It could be a week, two weeks, a month.
  - So, you don't work very hard.
  - You would probably jump to another job if the pay were the same but given regularly.
- B.F. Skinner
  - Skinner's contribution was radical behaviorism, or more properly, behavior analysis.
    - Behavior analysis sidesteps issues of mind by assuming environmental determininism and by including the internal environment (self talk, covert verbal behavior) as part of the environment.
      - Thus, dualistic issues are resolved and each human becomes subject to a unique set of environmental determinants composed of both external and internal environments.
    - Skinner revived Bacon's inductive method and his lack of theory.
    - Operants, the behaviors emitted by organisms, are selected by the environment in a quasi-evolutionary way.
    - Respondents, the behaviors caused by observable stimuli, were Skinner's term for Pavlovian or classical conditioning.
    - Skinner explored the ramifications of operant conditioning both in the lab and in the field.
    - Schedules of reinforcement, programmed instruction, and behavior modification were three of his most important contributions.
- Operant Conditioning (Video)

Escape and Avoidance Conditioning
- Escape conditioning: leave an aversive situation
  - e.g., dog in shuttle box jumps to other side when shocked, you don't go to dentist with mild tooth pain
- Avoidance conditioning: signal predicts shock so dog jumps when light comes on and AVOIDS the shock
  - What happens when shock turned off but light stays on? Dog jumps (e.g., avoidance conditioning is hard to extinguish)
- Graphic: Dog avoiding shock (signal preceded the shock)
- Note: this apparatus is used in Learned Helplessness research (see below)
- Escape conditioning is a form of aversive conditioning.
  - The word aversive refers to stimuli that are avoided.
  - Generally, those stimuli are unpleasant or painful.
- Escape conditioning occurs when an aversive stimulus is presented and an animal responds by leaving the stimulus situation.
- In the lab, escape conditioning can be demonstrated with a shuttle box. A shuttle box is an enclosure with two sections separated by a partition.
- For example, a dog shuttle box might have a low barrier separating the two halves.
- To demonstrate escape conditioning all that is necessary is to shock the dog's feet.
  - The dog then jumps to the other side of the box.
  - Then, after a time, the dog is shocked again.
  - It then jumps over the barrier again, escaping the shock.
- Someone who finds school aversive may escape that situation.
  - Most states require attendance until age 16.
  - However, once students turn 16 they no longer must attend. So, some drop out at that point. Dropping out is a form of escape conditioning.
- Avoidance conditioning is similar to escape conditioning.
  - The difference is that a CS is given before the presentation of an aversive stimulus.
  - For example, a light may precede the shock by a few seconds.
- What does the animal do under this new setup?
  - At first, its behavior is no different than it was for escape conditioning.
  - Namely, it jumps the barrier when the shock is delivered.
  - Soon, however, it begins to jump before the shock.
- It jumps when the light comes on and thus avoids the shock.
- Also, unlike escape conditioning, the animal settles down emotionally.
  - Dogs quit yelping, and calmly jump to the other side when the CS comes on.
  - Now, think what would happen if the CS were left on, but the shock discontinued.
- The animal continues to jump every time the light comes on.
  - The animal will not test whether or not the shock is still on.
- Therefore, avoidance conditioning is highly resistant to extinction.
  - You can get extinction if you restrain the animal after the light comes on, but it will struggle and show emotional arousal.
  - After a number trials under restraint, the animal will extinguish.
- Much of human behavior is explained via avoidance conditioning.
  - Think of going to the dentist, for example.
  - Most of us avoid going until the pain is already unbearable.
  - Another good example is when we have to get an injection.
  - We learned long ago to avoid the person and the place where injections are given.
  - Our children quickly learned that going into the back room at the local clinic was a CS for an immunization injection.
  - They began to cry as soon as they entered that room, well before actually getting the injection.
- sound of dentist's drill
  - Notice how most of us wait when our teeth hurt. We avoid the dentist (most of us, that is).
Learned Helplessness
- Seligman discovered it
  - Used dogs in shuttle box
  - But, he restrained the dogs while they were being shocked
  - Later, when they were not restrained most did not jump over barrier to avoid the shock
  - They had learned that there was nothing they could do
- Applies to humans as well
  - Think of spousal abuse
    - Many abused spouses act as if they cannot avoid the abuse
    - In fact, in many women's shelters it is difficult to make the abused wives not return to the abusing situation
  - Seligman placed human volunteers in a small room doing a repetitive task
    - In the room was a speaker with a volume knob playing loud, bad music
    - Seligman had disabled the knob, volunteers learned that it did not work
    - Later, he enable the knob (without informing them) hardly anyone retried the knob
    - They had learned to be helpless
Premack Principle and Response Hierarchy
- Premack provided a different definition of reinforcement
  - Reinforcement was the response, not the stimulus
  - In addition, everyone has a reinforcement hierarchy at any point (e.g., people would rather watch TV than wash dishes)
- Premack Principle says that lower level responses can be reinforced by higher level ones
  - e.g., make child practice violin (low level response) for 30 minutes and reinforce by allowing child to watch TV program (high level response)
  - can also self-reinforce that way: make yourself clean the garage (low level response) by not reading good book (high level response) until garage is clean
  - You can control your own behavior or others' behavior by using the Premack Principle.
    - First, you must learn the reinforcement hierarchy of the person you want to control.
      - Then, you must be able to make items high on that hierarchy dependent upon the performance of items low on the hierarchy.
      - For example, my children love candy.
      - What do they have to do to get it?
      - They have to eat other foods first.
      - Then they can have some candy.
    - I like to spend time on the Internet.
      - But, before I will allow myself to do so, I must complete some other lower ranking task like filling out health insurance forms.
- My reinforcement hierarchy right now might look like this:
Behavior Modification
- Applying the principles of classical conditioning and operant conditioning to real world
  - Classroom management: most likely you were subject to some kind of classroom management routine in K-12. (green card/red card, points for good behavior)
  - Therapy: managing therapeutic situations: reinforce patients for making their beds, coming to dining room, not being disruptive by giving them primary or secondary reinforcement
  - Behavior modification is the application of principles of conditioning to the everyday world.
    - In some sense, all of us are behavior modifiers.
      - But, we may not be aware that we are.
        
        For example, suppose you pick up your newborn every time you hear a cry.
        
        Soon, you will notice an increase in crying. Why? That child has learned that crying will be reinforced.
      - Parents have to learn to extinguish their infants' crying by not picking them up.
  - Behavior modification is also the name given to intentional efforts to modify behavior.
    - They can be as simple as placing objects that need to be taken somewhere in a prominent position.
      - For instance, I put all of the materials I will need for a class on the floor in the doorway of my office.
        
        Then, when I leave for class, I have to step over them.
        
        Stepping over those materials reminds me to take them.
      - The materials on the floor are a discriminative stimulus for my behavior of picking them up.
      - Or, sometimes my chair will give me a note for a student in my next class.
        
        I put the note under the clip of my pen on the outside of my shirt.
        
        When I get to class someone will ask me why I have a piece of paper attached to my shirt.
        
        Then, I deliver the note.
  - Behavior modification can also be used in schools and other settings to promote or to discourage certain behaviors.
    - For example, giving elementary students a gold star for performing certain behaviors is behavior modification.
      - The gold star is a secondary reinforcer.
    - A more complicated form of behavior modification is the token economy.
      - Token economies will have published rules, tokens, and reinforcers.
        
        In an adolescent halfway house, for example, a rule might be: Make your bed before 8 am.
        
        Adolescents who follow that rule will receive a token.
      - The tokens are secondary reinforcers because they can be "cashed in" in for other reinforcers such as edible treats, soft drinks, or permission to attend a movie.
      - Anything can be a token: a poker chip, a marble, or a value in a smartphone app.
  - Ethical issues surround behavior modification.
    - One issue revolves around its use with certain groups.
    - For example, few question the right of parents to modify their children's behavior.
    - However, do we have the right to apply behavior modification everywhere and anywhere?
    - If people ask to change, then there is usually no problem.
  - But if we apply behavior modification to people who do not want to change, that is an altogether different situation.
    - For example, prisoners who are asked to volunteer for violence-reduction training may do so because they know it will likely lead to early parole.
    - So, they are not really volunteers but are being coerced.
Operant Conditioning in Practice
- Operant conditioning is not only limited to laboratory situations. Here, I want to illustrate some ways that operant conditioning can be used daily.
  - Call forwarding--When I call forward to another extension, I turn the handset on my wired office phone upside down,.
    - When I return I do not have to remember that I set the phone to call forward.
    - The upside down receiver is a discriminative stimulus for me to turn off call forwarding.
    - I did have to tell the building custodian not to turn the handset over, however.
  - Piles on the floor--Before I go to class, I make a pile of the materials I will need in front of my door.
    - The pile has to be in a location that I will have to step over it before I walk out.
    - Again, the pile serves as a discriminative stimulus for the behavior of picking up the pile.
    - In fact, I often have to admonish (helpful) visitors not to pick up the pile!
  - Put it on my chair--Over the years, I have conditioned my colleagues NOT to put items on my desk; they will be lost forever.
    - Instead, I reward them for putting those items on my chair.
    - If they do, I am guaranteed to see them because I will not sit down without picking them up and looking at them.
  - Brat prevention--Children are good operant conditioners because they are so persistent.
    - Take the following situation, for example.
      - A child is riding with an adult, and the child is thirsty.
      - So, the child asks to stop and get a drink.
      - The adult says no, the child asks again, and again, and again...
      - Finally, the adult gives in, saying, "All right, just this once." Big mistake, right?
      - Why? The adult has now put the child on a partial schedule, guaranteeing a repetition of the same behavior later on.
      - Instead, the adult should have said,:
      - "All right, I'll get you a drink IF you don't ask for one for the next 10 (time may have to vary, depending on the child) minutes."
      - Then, the adult is providing the child with positive reinforcement for being quiet.
Superstitious Behavior
- Skinner discovered that organisms pay close attention to relationship between behavior and what follows
- He fed pigeons a treat after they made just one random response. The pigeons continued to make that response even though they were never fed again for making that response.
- Humans, too, develop superstitious behaviors: e.g., athletes wearing same underwear for every game
  - Imagine you don't do well on your first test so you buy a lucky rabbit's food and do well on next test. What will you bring to subsequent tests? (The rabbit's foot). Is there a real connection between your success and rabbit's foot? Of course not.
- Skinner demonstrated that behaviors selected at random and reinforced would be maintained.
  - This finding was extended to superstitious behavior.
  - Superstitious behavior is when conditioning occurs to R-->S^R pairings that simply happen by chance.
  - In other words, there is no contingency between the two, but animals and people act as if there were.
- For example, consider the pairing of holding a rabbit's foot and getting a high grade on a test.
  - Now, the next time a test comes around, you hold the rabbit's foot.
  - Most likely, there is no connection between test taking and holding a rabbit's foot, but you act as if there were.
- Another example is wearing "lucky socks."
  - Baseball players might do this.
  - Suppose a hitter is in a slump; his mother sends him some new socks, he wears them, and he gets three hits in the next game.
  - What does he do next?
  - He wears the socks again.
  - What is the likelihood of the socks having contributed to his getting the three hits?
- So, when a response and a reinforcer follow each other, but only by chance, a superstitious behavior may develop.

Drug Detecting Dogs (p. 184)
- Marijuana at airport: 90% success
- Heroin at airport: 70% success
- False alarms are a problem
Putting It All Together (p. 190)
- Like in classical, timing affects acquisition
- Shorter intervals between response and reinforcement lead to faster acquisition
Choice Behavior
- Matching Law (p. 198)
  - Animals (pigeons, rats, humans) will respond to choice stimuli so that their responses closely match the reinforcement schedule

Brain Substrates

Dorsal striatum: Important for S-R learning (e.g., classical and operant conditioning).
Orbitofrontal cortex: Important in the prediction of outcomes of particular responses.
Hedonic value: the subjective desirability of an operant reinforcer.
Motivational value: the degree to which an organisms will work to obtain access to a stimulus.
Dopamine's role (pp. 210-212)
Endogenous opiods (endorphins)

Produces endorphins (e.g., "runner's high")

Chariots of Fire(1981) trailer
- Notice how happy the runners are
- Notice how you may, reluctantly, exercise and later feel better
  - More running and smiling
Exercise increases endorphins.

Punishment
- Nocireceptors begin the process
- Pain, by itself, may not be aversive (p. 213)
  - Insular cortex may play role
  - Dorsal anterior cingulate cortex may be involved with motivational aspects of pain.
- "However, much remains to be learned about how we process and respond to punishers in the brain." (p. 215)

Clinical Perspectives

Evolution and diet.
- Early humans needed to store fat (lipids) because meals were not regular
- Now, that hard wired need for fat can lead to overweight.
Drug Addiction

As in the text, I am "addicted" to caffeine
Fortunately, caffeine intake is self-limiting (unlike other pathological addictions)

Pathological Addictions
- Hard to kick such addictions
- The "high" is positively reinforcing
- The withdrawal symptoms may be negatively reinforcing (p. 217)
Psychological Addictions
- Include:
  - Gambling
  - Internet Gaming Disorder
  - Thrill Seeking
Treatments for Addiction
Usually require outside help:
- Individual therapy
- Self-help Groups:
  - Alcoholics Anonymous (AA) and similar groups
    - "Are you a friend of Bill's?" is the not-so-secret code to find a fellow AA member.
    - Groups such as these attempt to remove sufferers from the discriminative stimuli that lead to the addictive behavior.
    - They also provide a new social group, one that is trying not to engage in the addictive behavior.
- Distancing
- Differential Reinforcement of Alternative Behaviors
- Delayed Reinforcement

Back to Main Page