INDEX
Explanations
references to past events or highlights
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.12
3:0.13
4:0.17
5:0.03
6:0.05
7:0.19
8:0.03
9:0.04
10:0.08
11:0.07
Negative Logits
mistakenly
-1.58
mist
-1.55
wrongly
-1.54
incorrectly
-1.53
falsely
-1.53
!=
-1.52
differently
-1.52
otom
-1.50
ationally
-1.49
ealous
-1.49
POSITIVE LOGITS
mosqu
1.68
firepower
1.51
resurg
1.46
lineback
1.42
amaz
1.41
qualitative
1.38
ibaba
1.37
TBA
1.36
gie
1.36
Legions
1.36
Activations Density 0.001%