INDEX
Explanations
instances of the word "of"
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.09
4:0.07
5:0.09
6:0.07
7:0.08
8:0.09
9:0.09
10:0.08
11:0.07
Negative Logits
Kyoto
-2.90
decipher
-2.75
kindred
-2.61
ancest
-2.47
document
-2.46
Scandinavian
-2.46
deline
-2.45
anthrop
-2.45
Yosemite
-2.41
Neurolog
-2.40
POSITIVE LOGITS
elight
3.07
odder
2.96
VERTISEMENT
2.89
bek
2.72
zb
2.64
bh
2.60
rition
2.58
rophe
2.57
BS
2.56
eker
2.52
Activations Density 0.000%