INDEX
Explanations
references to political figures and significant events
New Auto-Interp
Head Attr Weights
0:0.07
1:0.06
2:0.02
3:0.06
4:0.10
5:0.07
6:0.03
7:0.01
8:0.06
9:0.43
10:0.02
11:0.01
Negative Logits
Annotations
-2.38
Located
-2.32
Afee
-2.25
ori
-2.13
Usually
-2.13
Usually
-2.04
folios
-2.01
usually
-2.01
Probably
-2.01
Estimates
-2.01
POSITIVE LOGITS
somehow
2.53
sufficiently
2.53
someday
2.52
pans
2.33
succeeds
2.29
fails
2.27
materially
2.13
pires
2.13
weren
2.08
bothers
2.05
Activations Density 0.316%