









|
|
-
Shannon, C.E.
(1949)
-
Inventor of
information theory (communication theory)
-
Published at
Bell Laboratory in 1948
-
The theory based
on signal transmission and processing for telecommunication, such as
radio, television, computers, and other data processing devices
-
Considered as
the first to separate the problem of delivering a message from the
meaning of the message
-
Movement
scientists adopted the concepts and transferred them to human system
where nerves are information transmission channels
-
Shannon-Hartley Theorem
-
Information rate tells us
how fast information can travel over a given channel
-
When a source sends r
message per second and the entropy (uncertainty) of each message is H
bits per message, then the information rate is
-
R = r*H
[1]
-
It seems that errors would increase with the higher
R, but it does not necessarily happen when R £
C (Channel capacity: a maximum information rate)
-
Because if one transmits signals
with an information rate R such that R £
C, then one can approach arbitrarily small
error probabilities provided that the information is coded intelligently
-
Shannon-Hartley Theorem
C = W*log2(1+S/N)
[2]
where W is bandwidth (the frequency range where the signal power drops
drown to half of maximum power), S is signal amplitude, and N is noise
amplitude
-
Basic ideas of
information theory
-
Information
theory provides a yardstick for measuring organization
-
A well-organized
system is very predictable -> you do not learn very much which is new ->
this system has little information
-
On the other
hand, a poorly organized system is not predictable -> you learn very
much which is new -> this system have much information
-
Logarithm (Ax=B
-> logAB = x)
-
Base 10, e, or 2 (binary)
-
log(m) + log(n) = log(m*n)
-
log(m) - log(n) = log(m/n)
-
log(1) = 0
-
n*log(m) = log(mn)
-
Probability
-
p(A and B) = p(A) X p(B)
-
p(A or B) = p(A) + p(B) - p(A and B)

-
Probability of "not A": p(A') = 1-p(A)
-
Conditional probability (probability of A with B
given): pB(A)
-
An amount of
information depends on the reduction of Uncertainty
-
Suppose there are 8 cards numbered from 1 to 8. I
picked one in my mind and asked you to guess which one I picked. On the
average, you would expect to ask 4 questions such as "Is it 2? (Strategy
1)
-
If we based the unit of information on the average
number of answers before the number is found, we arrive at a measure of
difficulty
-
You can use a different strategy of questioning,
such as "Is the number smaller than 5?" (Strategy 2) to reduce the
number of questions
-
It seems reasonable to base the unit of difficulty
on the number of questions (answers) required when the optimal strategy
is required; this is the basis of the standard unit of selective
information
-
Since the amount of information is closely related
to the number of cards, we write an amount of information as a function
of the number of cards as follows;
H(8) = 3 units of information/uncertainty per number
H(16) = 4 units of information/uncertainty per number
-
The amount of information is called "Uncertainty"
(Entropy) in this sense and denoted by H
-
Norbert Wiener's point: use the name 'negative
entropy' for H since 'Just as the amount of information in a system is a
measure of its degree of organization, so the entropy of a system is a
measure of its degree of disorganization; and the one is simply the
negative of the other' (Wiener, 1948)
-
-
One way of
measuring is taking ratios or halves; one unit of information is gained
when half of the alternatives are eliminated as in Strategy 2;
bits = log2(x), where x is the number of possible
states [3]
-
Message has a probability
p, when it is one out of 1/p possibilities;
information of message =
log2(1/p)=-log2(p)
[4]
-
Average amount of
information = mean logarithmic probability for all messages from one
source
H(x) = mean of [-log2(p)]
= ∑p[-log2(p)]
[5]
-
Related sources X and Y
-
Information of x + Information of y -
codomainof Information of x and Information of y
-
H(x,Y) = Hy(x)
+ Hx(y) - I(x:y)
[6]
-
eg) H(x) is stimulus (S), H(y) is response
(R), I(x:y) is degree of dependency between S and R; for
perception I(x:y) can be considered as a measure of discrimination
-
Channel capacity (C)
-
-
Upper limit of information transmission is
I(x:y)
-
C = I(x:y)
= Hy(x) + Hx(y) - H(x,Y)
[7]
-
C can be viewed as
another version of Weber fraction
-
Redundancy: successive occurrences are not
independent; they have some redundancy captured by
I(x:y)
-
Information
theory (communication theory)
-
Invented by
Shannon
-
If there is a systematic
relationship between two variables x and y, something about x can be
known by the present state of y
channel
x ---------------------> y
-
What is the uncertainty of x ? (how much don't we
know about x?)
-
How much of the uncertainty about X does Y resolve?
-
One random variable
-
Let's consider a random variable, x
-
Let's assume that x has a finite set of states: x0,
x1, x2,
....., xn
-
The probability of xi
can be written px(xi)
-
Sum of all state probability of x
∑px(xi)=1
-
Probability distribution

-
Two random variables
-
Let's consider two random variables x and y
-
Probability distribution

-
The probability of co-occurrence of xi
and yi
pairs = pxy(xi,yi)
-
If there is no
relationship between x and y and independent each other ,
pxy(xi,yi)
= px(xi)*py(yi)
-
However, pxy(xi,yi)
= px(xi)*py(yi)
is not always true
-
Uncertainty (McGill and
Quastler, 1955) or Entropy (Shannon, 1948)
-
Uncertainty of xi
h(xi)=-log[px(xi)]
-
Average uncertainty of all xi
H(x)=∑px(xi)*h(xi)=-∑px(xi)*log[px(xi)]
[8]
-
In two-variable situation
[9]
-
Therefore,
I(x:y) =
H(x) +
H(y) - H(x,y)
[10]
I(x:y) is how much, on average,
you learn about xi,
by seeing yi
-
Example
-
If we drop two
balls (x and y) on a k X k grid pannel (k X k = N) independently
[11]
-
Let's assume that we are dropping two
balls (x and y) on a k X k grid pannel (k X k = N) at the same time
-
Two conditions
-
Condition 1: Drop two balls at the
same time
-
Condition 2: Tie two balls with a
string of length 2 and drop them at the same time
-
Condition 1
-
[12]
-
[13]
-
Condition 2
-
[14]
-
[15]
-
Therefore, what we know about y from
x is large in Condition 2 as compared to Condition 1
-
Speed-Accuracy
Trade-Off (Fitts' Law)
-
Woodworth (1899)
-
Drawing
different distances of lines at different speeds
-
Drawing movement
was consisted of a ballistic phase (open loop, feedforward) and a
current control (closed loop, feedback)
-
Accuracy
decreases with drawing speeds
-
Fitts (1954)
-
First
application of information theory to motor system
-
Isolation of
motor processes using over-learnt movements and keeping stimulus
conditions more or less constant reveals limitations of capacity of
motor system
-
Capacity is the
ability to consistently produce one class of movements
-
The greater the number of
alternatives, the greater the information procession capacity required
by the movement
-
Information
capacity can be inferred from variability of successive responses in
constant performance: the variability reflects the channel noise in
optimum movement
-
The rate of
information transmission can estimated from magnitude of noise to
possible range of responses
-
Channel capacity
depends on not only the average amplitude, but also the tolerance
-
Tasks
-
Repeated tapping
with a stylus between two rectangles with maximum speeds and different
target widths (W, tolerance range, noise)
-
Disc transfer
from one pin to another with different diameter of pin and center hole
-
Pin transfer
from set of holes with different sizes of pins and movement amplitudes
(A)

-
It was found
that the movement time was a logarithmic function of movement amplitude
when target width was constant and the movement time was also
logarithmic function of target width with a constant movement amplitude
-
MT (movement
time) = a + b*log2(2A/W) [16]
-
ID
(index of difficulty) = log2(2A/W) = -log2(W/2A)
[17]
-
IP
(index of performance) = -(1/MT)*log2(W/2A)
[18]
-
The higher ID
requires more decisions, which requires more channel capacity, therefore
slowing down for difficult tasks
-
When the rate of
information processing is optimized (maximized by over-learning in this
case), the speed has to be traded off against accuracy
-
Fitts' Law JAVA
demo (from http://ei.cs.vt.edu/~cs5724/g1/tap.html)
-
Crossman and
Goodeve (1963, 1983)
-
Schmidt,
Zelaznik, Hawkins, Frank, and Quinn (1979)
-
Carlton (1994)
-
Newell et al.
(1993): Space-time accuracy during rapid movements
-
Task: fast
pre-programmed movement: elbow movement during 150-400 ms
-
Results
-
Timing error is
a decreasing function of movement speed
-
Spacial error is
an increasing function of movement speed
-
Coefficient of
determination between timing error and spacial error within a subject: r2=-0.98
-
The results are
contrary to Schmidt's impulse-variability theory (the velocity has no
effect on timing variability)
-
Kelso, Southard,
and Goodman (1979): Two handed movement coordination
-
Task: reach one
target or two targets with different index of difficulties as fast as
possible
-
Conditions
-
Two target
widths
-
Two target
amplitudes (distances)
-
Single and
two-handed performance
-
However, the
smaller target always had long distance and the larger target always had
smaller distance from the starting position to the target

-
Results
-
One hand tasks
satisfied Fitts' Law
-
Two-hand
movements were initiated at the same time and landed on the targets at
the same time regardless of different widths and amplitudes
-
MT (movement
time = total response time - reaction time (RT)) of one hand and two
hand movements with a same index of difficulty were not different
-
When ID
on two hands were different, MT of the hand with a lower ID
increased, therefore, the difficult task determined time in the two hand
performance
-
Hands are not
controlled as separate units, but they perform as a synergy
|
|
|
|