INDEX
Explanations
proper nouns and geopolitical terms
New Auto-Interp
Negative Logits
Ò
-0.80
thood
-0.75
����
-0.74
ceive
-0.72
eno
-0.70
Ïī
-0.69
imi
-0.68
leeve
-0.67
without
-0.67
ÏĢ
-0.67
POSITIVE LOGITS
oret
1.63
resa
1.36
odore
1.31
biggest
1.25
ories
1.21
easiest
1.19
downside
1.19
latter
1.18
simplest
1.18
earliest
1.11
Activations Density 2.264%