Chapter 5 Operant Conditioning: Learning the Outcome of Behaviors
Modified: 2024-09-25 10:03 PM CDST
Behavioral Processes
OPERANT CONDITIONING
Thorndike's Law of Effect is the first statement about reinforcement.
Notice is is not really a behavioral definition
His major contributions however were his laws of learning, the most important of which was his law of effect (Thorndike, 1911, p. 244):
"Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond."
Formula for operant is:
SD-->R-->SR
SD is a DISCRIMINATIVE STIMULUS
it "sets the occasion" or informs the organism that a response will be followed by a SR or reinforcer
SR is the reinforcer (see more info below)
R is the response
Pigeon operant chamber (both: "Skinner boxes")
In operant conditioning a reinforced response is repeated (e.g., a hungry rat learns to press a lever and food follows, eventually it presses it over and over until it is no longer hungry)
The paradigm for operant conditioning is:
SD-->R-->SR
where and SD is a discriminative stimulus, R is a response, and SR is a reinforcer.
In operant conditioning, an animal must first make a response. Notice, that is not the case in classical conditioning.
That response is usually preceded by a discriminative stimulus, and
sometimes followed by a reinforcer.
Operant conditioning occurs when the association of response and
reinforcer causes the animal to make the response again later in a
similar situation.
The discriminative stimulus signals the animal that
a response at a given time is likely to be reinforced.
The response
MUST be made, for without it, the reinforcement will not occur.
Finally, the reinforcer has the property of making the response that
precedes it more likely to recur.
Example: Think of a traffic light.
It's a complex discriminative stimulus.
Green means go.
Red means stop.
Example: Imagine that you are in a future classroom and a green light appears on your console.
Were you to raise your hand and ask a question, you would get a $20 bill.
If you never raise your hand then no money.
But, if you do raise your hand and get the money you'll probably raise your hand again.
But, when you raise your hand when the green light is not on, nothing happens, no $20 bill.
Soon, the green light comes on, you raise your hand and the money is dispensed.
Now, that green light has become a discriminative stimulus.
If it is on AND you raise your hand you'll get the money.
If it is off and you raise your hand nothing happens.
What do you think you'll do when that green light appears?
The apparatus used to study operant conditioning removes many
distractors.
An animal in an operant chamber will eventually direct
its attention to the manipulandum (e.g., the lever or key) that
activates the reinforcer.
So, operant conditioning can be defined:
As a procedure in which a
response followed by a stimulus recurs.
It recurs because of the
stimulus.
That stimulus is called a reinforcer.
Reinforcers are
stimuli that make the response that preceded them more likely to
recur.
Typical reinforcers include the giving of food or drink, and
the removal of shock or pain.
Other properties of
reinforcers will be covered below.
Discriminative stimuli set the occasion for a response and its
associated reinforcer to occur.
So again, a green traffic light is
the discriminative stimulus for the response of pressing on the
accelerator of your car.
But the facial expression of your boss may set
the occasion for you NOT to ask your boss for a raise (because of a
likely negative answer to your question).
Reinforcement increases the response
Positive reinforcement is delivered after the response and the response continues
e.g., you raise you hand in class and $20 comes out of your desk. What do you do? (Raise hand again)
Negative reinforcement is taken away after the response and the response continues
Good examples: raise umbrella when it's raining (you are no longer wet);
Take out trash when spouse nags you (nagging stops)
Positive reinforcement is when a response is followed by the addition
of a stimulus, and then that response is more likely to recur.
Negative reinforcement, on the other hand, is when a response is
followed by the removal of a stimulus and then that response is more
likely to recur.
Notice that negative reinforcement also makes the
response more likely to recur.
Let's revisit that hypothetical
classroom in the near future.
Now, when you sit in your desk, you are
subjected to electric shock. Whenever you are in your desk you are
being shocked.
One day you ask a question, and the shock disappears,
briefly.
You ask another question, and it disappears briefly again.
Soon, you are asking a lot of questions.
Your question asking is also
being reinforced, but now by the removal of a stimulus, or by
negative reinforcement.
Discriminative Stimuli
SD-stimulus signaling opportunity for reinforcement, pigeon and green light
"Want a back rub?"
SΔ -stimulus signaling the lack of opportunity for reinforcement, frown
"I have such a headache" (SΔ is used to indicate that reinforcement will not follow.)
Discriminative stimuli are the first step in a reinforcement situation
SD -> R -> SR
Or, discriminative stimulus, followed by response, followed by reinforcement
Example: Red or Green traffic lights (another view of this by famous comic)
Shaping is reinforcing successive approximatins of a final desired response.
In the video below notice how Skinner gradually gets the pigeon to execute a full turn (the final desired response) by first reinforcing left turns, later only reinforcing half turns.
The light tells the pigeon that reinforcement has been delivered.
Notice how quickly the pigeon makes the full turn.
Skinner could have just waited a long time for the pigeon to make a full turn and then reinforced it, but shaping makes that wait unecessary.
Shaping, then, is a technique for getting to a final desired response more quickly.
See Skinner shaping a pigeon in class (below)
He rewards the pigeon with food for making more and more of a circle by turning to the left
Primary: biologically relevant reinforcers such a food, water, pleasure, pain (there are only a few primary reinforcers and they are, essentially, the same as UCSs.
Secondary: secondary reinforcers come to act as reinforcers when they are associated with primary reinforcers
Consider the following sequence in dog training: "sit" (dog eventually sits)->"good dog" (give a treat, repeat)
Eventually the dog will treat the words "good dog" as a reiforcer, a secondary reinforcer
Most human behavior is reinforced by secondary reinforcers
You, for example, will do things to increase your number of points in this class
I'd bet many of you would wash my car for 10 points added to your score, right? (Too bad that's unethical)
Money, by definition, is a secondary reinforcer (you can't eat it or drink it)
Secondary: associated with primary (usually via classical conditioning) and they ALSO act as reinforcers:
e.g., grades, money, praise (a large number of secondary reinforcers exist)
Primary reinforcers are biological.
Food, drink, and pleasure are the
principal examples of primary reinforcers.
But, most human
reinforcers are secondary, or conditioned.
Examples include money,
grades in schools, and tokens.
Secondary reinforcers acquire their power via a history of
association with primary reinforcers or other secondary reinforcers.
For example, if I told you that dollars were no longer going to be
used as money, then dollars would lose their power as a secondary
reinforcer.
Another example would be in a token economy.
Many therapeutic
settings use the concept of the token economy.
Remember, a token is
just an object that symbolizes some other thing.
For example, poker
chips are tokens for money.
Back in the day in New York City, subway tokens used to be
pieces of metal that could be inserted into the turnstiles of the
subway.
Old NYC subway token, at one point worth $1.25, so four tokens = $5.00, but only in NYC
Small debts were often paid off using tokens in New York because
of the token's value of one subway ride.
However, attempting to pay
off debts elsewhere using NYC subway tokens would not be acceptable.
In a token economy, people earn tokens for making certain
responses; then those tokens can be cashed in for privileges, food,
or drinks.
For example, residents of an adolescent halfway house may
earn tokens by making their beds, being on time to meals, not
fighting, and so on.
Then, being able to go to the movies on the
weekend may require a certain number of tokens.
Punishment decreases the response
NOTE: Punishment is NOT the exact opposite of reinforcement because punishmnent arouses negative emotions while reinforcement does not arouse positive emotions
Positive punishment is delivered after the response and the response stops
e.g., your raise your hand in that future classroom and you get a shock. What do you do? (Not raise hand again.)
Negative punishment is when something is taken away and the response stops
e.g., you raise your hand in class and points toward your grade are taken away (like a fine). What do you do? (Not raise your hand again.
Good examples: child smarting off, take away phone, defensive football player delivers illegal hit, sent out of game. (What does player do in subsequent games?)
Punishment is when a stimulus that follows a response leads to a
lower likelihood of that response's recurring.
Note that
reinforcement, either positive or negative, leads to a higher
likelihood of a response recurring.
Positive punishment is when a response is followed by the addition
of a stimulus.
Note that this scene
sometimes occurs in real classrooms.
If a student asks a question and
then hears something like,;
"That's the stupidest question I ever
heard" from the instructor,
that student will likely not ask many
questions in the future.
Negative punishment is when a response is followed by the removal
of an already present stimulus, and that leads to that response's
occurring less often.
For example, in that future classroom again, if
you asked a question and you had to reeturn one of your $20 bills every time you asked a question, you would
probably quit asking questions.
Although it might appear that reinforcement and punishment are
opposites, they are not.
But, reinforcement does not similarly
arouse positive emotions like love, liking, and attraction.
Think of
jilted suitors who ask why their partners left.
They might wonder
why their partners left even though they gave their partners
expensive gifts.
Those expensive gifts did not, in and of themselves,
lead to love.
Skinner (1953) offered three reasons why punishments should not be administered:
they only work temporarily,
they create conditioned stimuli that lead to negative emotional reactions,
and they reinforce escape from the conditioned situation in the future.
He wrote (pp. 192–193):
Civilized man has made some progress in turning from punishment to alternative forms of control . . . But we are still a long way from exploiting the alternatives, and we are not likely to make any real advance so long as our information about punishment and the alternatives to punishment remains at the level of casual observation. As a consistent picture of the extremely complex consequences of punishment emerges from analytical research, we may gain the confidence and skill needed to design alternative procedures in the clinic, in education, in industry, in politics, and in other practical fields.
Schedules of
Reinforcement
Skinner analyzed the effect of how reinforcers are delivered
He found that the scheduling of reinforcent made a big difference
There are many reinforcement schedules but the main ones are:
Fixed Interval
get a reinforcer after a fixed period (e.g., 10 seconds, one day, one week) of time provided response is made
paycheck is an example
Variable Interval
get a reinforcer after a random period provide a response is made
example: working for shady character who pays you on an irregular basis
Fixed Ratio
get a reinforcer after a fixed number of responses (e.g., 10, 50, 200)
piecework is an example (e.g., sewing garments, get paid for each one made)
Variable Ratio
get a reinforcer after a variable and random number of responses
gambling is good example, especially slot machines
Examples
The steeper the curve the stronger the response
One of the major discoveries of operant conditioning was that not
only do reinforcers have the power to cause responses to be made more
often, but that how and when those reinforcers are delivered also
affects the pattern of responses.
Controlling the how and when of
reinforcement is a reinforcement schedule.
Schedules are of two main types, time-based and response-based.
Time-based schedules usually contain the word interval, as in time
interval.
Response-based schedules usually contain the word ratio,
referring to the ratio of responses over time.
Fixed interval (FI) schedules reinforce any response made after a given interval measured from the preceding reiforcement is reinforced.
A given interval is indicated by the addition of a number to the letters FI (seconds, usually).
Thus, in FI 15 the first response which occurs fifteen seconds or more after the preceding reinforcement is reinforced.
Variable interval (VI) schedules reinforce any response made after a variable amount of time.
A VI 20 would reinforce after an average of 20 seconds, not every 20 seconds.
Fixed ratio (FR) schedules deliver a reinforcer based upon
a constant number of responses.
For example, a FR-10 schedule would
deliver a reinforcer every 10th response.
Variable ratio (VR) schedules are similar to fixed ratio,
except that the number of responses required for a reinforcer changes
each time.
So, a VR-15 schedule would deliver a reinforcer over an
average of 15 responses, not on every 15th response.
Let's examine some everyday examples of reinforcement schedules
and their effects.
A paycheck is a good example of an FI schedule.
Workers get a check once a week, for example, if they show up and
work. They do not get rewarded for working harder, or penalized for
working less.
Workers who work by the piece or by the job, piecework,
are paid more if they produce more, and are paid less if they produce
less.
Piecework is an example of an FR schedule.
Workers typically
work harder on FR schedules than they do on FI schedules.
Gambling is the classic example of a VR schedule.
Part of the
allure of gambling is its uncertain payoff.
Imagine a slot machine
that paid off every 10th time; only the 10th pull would be exciting.
A real slot machine, on the other hand, pays off on a random basis,
so each pull is exciting.
VR schedules maintain behavior at very high
rates.
Gambling is also addictive, see below.
About the best everyday example of a VI schedule that I can think
of is working for a shady character.
This person pays you, but you
never know when payday is going to be.
It could be a week, two weeks,
a month.
So, you don't work very hard.
You would probably jump to
another job if the pay were the same but given regularly.
B.F. Skinner
Skinner's contribution was radical behaviorism, or more properly,
behavior analysis.
Behavior analysis sidesteps issues of mind by
assuming environmental determininism and by including the
internal environment (self talk, covert verbal behavior) as part of
the environment.
Thus, dualistic issues are resolved and each human
becomes subject to a unique set of environmental determinants
composed of both external and internal environments.
Skinner revived
Bacon's inductive method and his lack of theory.
Operants, the
behaviors emitted by organisms, are selected by the environment in a
quasi-evolutionary way.
Respondents, the behaviors caused by
observable stimuli, were Skinner's term for Pavlovian or classical
conditioning.
Skinner explored the ramifications of operant
conditioning both in the lab and in the field.
Schedules of
reinforcement, programmed instruction, and behavior modification were
three of his most important contributions.
Comments
Skinner expanded on the work of Pavlov and Watson by redefining
the human organism's environment to include the things people say to
themselves.
The same rules of conditioning that apply to the external
environment also apply to that internal environment.
Skinner created
a logical and self-consistent system that continues to have a small but vocal
minority of adherents today.
Operant Conditioning (Video)
Shows aspects of operant conditioning
Escape and Avoidance
Conditioning
Escape conditioning: leave an aversive situation
e.g., dog in shuttle box jumps to other side when shocked, you don't go to dentist with mild tooth pain
Avoidance conditioning: signal predicts shock so dog jumps when light comes on and AVOIDS the shock
What happens when shock turned off but light stays on? Dog jumps (e.g., avoidance conditioning is hard to extinguish)
Notice how most of us wait when our teeth hurt. We avoid the dentist (most of us, that is).
Learned Helplessness
Seligman discovered it
Used dogs in shuttle box
But, he restrained the dogs while they were being shocked
Later, when they were not restrained most did not jump over barrier to avoid the shock
They had learned that there was nothing they could do
Applies to humans as well
Think of spousal abuse
Many abused spouses act as if they cannot avoid the abuse
In fact, in many women's shelters it is difficult to make the abused wives not return to the abusing situation
Seligman placed human volunteers in a small room doing a repetitive task
In the room was a speaker with a volume knob playing loud, bad music
Seligman had disabled the knob, volunteers learned that it did not work
Later, he enable the knob (without informing them) hardly anyone retried the knob
They had learned to be helpless
Premack Principle and
Response Hierarchy
Premack provided a different definition of reinforcement
Reinforcement was the response, not the stimulus
In addition, everyone has a reinforcement hierarchy at any point (e.g., people would rather watch TV than wash dishes)
Premack Principle says that lower level responses can be reinforced by higher level ones
e.g., make child practice violin (low level response) for 30 minutes and reinforce by allowing child to watch TV program (high level response)
can also self-reinforce that way: make yourself clean the garage (low level response) by not reading good book (high level response) until garage is clean
You can control your own behavior or others' behavior by using the
Premack Principle.
First, you must learn the reinforcement hierarchy
of the person you want to control.
Then, you must be able to make
items high on that hierarchy dependent upon the performance of items
low on the hierarchy.
For example, my children love candy.
What do
they have to do to get it?
They have to eat other foods first.
Then
they can have some candy.
I like to spend time on the Internet.
But,
before I will allow myself to do so, I must complete some other lower
ranking task like filling out health insurance forms.
My reinforcement hierarchy right now might look like this:
Read electronic mail
Navigate the Internet
Eat a snack
Read the news online
Watch some television
Take a short nap
Take a short walk
Drink some coffee
Clean out my truck
Wash the dishes
Feed the dogs
Make dinner for family
Mow the lawn
Paint the lawn furniture
Install closet shelving
Clean my office
See dentist about rotten tooth
Prepare tomorrow's lecture
Prepare for new class in Fall
Roof house
By making myself wash the dishes before I watch TV, I am using the
Premack Principle.
Can you see why?
How could parents use the Premack Principle at McDonalds?
By using the child's reinforcement hierarchy:
Play in playground
Eat french fries
Eat chicken nuggets
They would have the child eat chicken nuggets before child could go to the playground.
Behavior
Modification
Applying the principles of classical conditioning and operant conditioning to real world
Classroom management: most likely you were subject to some kind of classroom management routine in K-12. (green card/red card, points for good behavior)
Therapy: managing therapeutic situations: reinforce patients for making their beds, coming to dining room, not being disruptive by giving them primary or secondary reinforcement
Behavior modification is the application of principles of
conditioning to the everyday world.
In some sense, all of us are
behavior modifiers.
But, we may not be aware that we are.
For
example, suppose you pick up your newborn every time you hear a cry.
Soon, you will notice an increase in crying. Why? That child has
learned that crying will be reinforced.
Parents have to learn to
extinguish their infants' crying by not picking them up.
Behavior modification is also the name given to intentional
efforts to modify behavior.
They can be as simple as placing objects
that need to be taken somewhere in a prominent position.
For
instance, I put all of the materials I will need for a class on the
floor in the doorway of my office.
Then, when I leave for class, I
have to step over them.
Stepping over those materials reminds me to
take them.
The materials on the floor are a discriminative stimulus for my behavior of
picking them up.
Or, sometimes my chair will give me a note for a
student in my next class.
I put the note under the clip of my pen on
the outside of my shirt.
When I get to class someone will ask me why
I have a piece of paper attached to my shirt.
Then, I deliver the
note.
Behavior modification can also be used in schools and other
settings to promote or to discourage certain behaviors.
For example,
giving elementary students a gold star for performing certain behaviors is
behavior modification.
The gold star is a secondary reinforcer.
A more complicated form of behavior modification is the token
economy.
Token economies will have published rules, tokens, and reinforcers.
In an adolescent halfway house, for example, a rule might be: Make your bed before 8 am.
Adolescents who follow that rule will receive a token.
The tokens are secondary reinforcers because they
can be "cashed in" in for other reinforcers such as edible treats, soft drinks, or permission to attend a movie.
Anything can be a token: a poker chip, a marble, or a value in a smartphone app.
Ethical issues surround behavior modification.
One issue
revolves around its use with certain groups.
For example, few
question the right of parents to modify their children's behavior.
However, do we have the right to apply behavior modification
everywhere and anywhere?
If people ask to change, then there is
usually no problem.
But if we apply behavior modification to people
who do not want to change, that is an altogether different situation.
For example, prisoners who are asked to volunteer for
violence-reduction training may do so because they know it will
likely lead to early parole.
So, they are not really volunteers but
are being coerced.
Operant Conditioning
in Practice
Operant conditioning is not only limited to laboratory situations.
Here, I
want to illustrate some ways that operant conditioning can be used
daily.
Call forwarding--When I call forward to another extension,
I turn the handset on my wired office phone upside down,.
When I return I do not have to
remember that I set the phone to call forward.
The upside down
receiver is a discriminative stimulus for me to turn off call
forwarding.
I did have to tell the building custodian not to turn the
handset over, however.
Piles on the floor--Before I go to class, I make a pile of
the materials I will need in front of my door.
The pile has to be in
a location that I will have to step over it before I walk out.
Again,
the pile serves as a discriminative stimulus for the behavior of
picking up the pile.
In fact, I often have to admonish (helpful) visitors not
to pick up the pile!
Put it on my chair--Over the years, I have conditioned my
colleagues NOT to put items on my desk; they will be lost forever.
Instead, I reward them for putting those items on my chair.
If they
do, I am guaranteed to see them because I will not sit down without
picking them up and looking at them.
Brat prevention--Children are good operant conditioners
because they are so persistent.
Take the following situation, for
example.
A child is riding with an adult, and the child is thirsty.
So, the child asks to stop and get a drink.
The adult says no, the
child asks again, and again, and again...
Finally, the adult gives
in, saying, "All right, just this once." Big mistake, right?
Why? The
adult has now put the child on a partial schedule, guaranteeing a
repetition of the same behavior later on.
Instead, the adult should
have said,:
"All right, I'll get you a drink IF you don't ask for one
for the next 10 (time may have to vary, depending on the child)
minutes."
Then, the adult is providing the child with positive
reinforcement for being quiet.
Superstitious
Behavior
Skinner discovered that organisms pay close attention to relationship between behavior and what follows
He fed pigeons a treat after they made just one random response. The pigeons continued to make that response even though they were never fed again for making that response.
Humans, too, develop superstitious behaviors: e.g., athletes wearing same underwear for every game
Imagine you don't do well on your first test so you buy a lucky rabbit's food and do well on next test. What will you bring to subsequent tests? (The rabbit's foot). Is there a real connection between your success and rabbit's foot? Of course not.
Skinner demonstrated that behaviors selected at random and reinforced
would be maintained.
This finding was extended to superstitious
behavior.
Superstitious behavior is when conditioning occurs to
R-->SR pairings that simply happen by chance.
In other
words, there is no contingency between the two, but animals and people act as
if there were.
For example, consider the pairing of holding a rabbit's foot and
getting a high grade on a test.
Now, the next time a test comes
around, you hold the rabbit's foot.
Most likely, there is no
connection between test taking and holding a rabbit's foot, but you
act as if there were.
Another example is wearing "lucky socks."
Baseball players might
do this.
Suppose a hitter is in a slump; his mother sends him some
new socks, he wears them, and he gets three hits in the next game.
What does he do next?
He wears the socks again.
What is the
likelihood of the socks having contributed to his getting the three
hits?
So, when a response and a reinforcer follow each other, but only
by chance, a superstitious behavior may develop.
Drug Detecting Dogs (p. 184)
Marijuana at airport: 90% success
Heroin at airport: 70% success
False alarms are a problem
Putting It All Together (p. 190)
Like in classical, timing affects acquisition
Shorter intervals between response and reinforcement lead to faster acquisition
Choice Behavior
Matching Law (p. 198)
Animals (pigeons, rats, humans) will respond to choice stimuli so that their responses closely match the reinforcement schedule
Brain Substrates
Dorsal striatum: Important for S-R learning (e.g., classical and operant conditioning).
Orbitofrontal cortex: Important in the prediction of outcomes of particular responses.
Hedonic value: the subjective desirability of an operant reinforcer.
Motivational value: the degree to which an organisms will work to obtain access to a stimulus.