INDEX
Explanations
proper nouns or specific entities among various examples
phrases that refer to groups or collections
New Auto-Interp
Negative Logits
irez
-0.70
adian
-0.70
ifix
-0.66
nery
-0.64
idia
-0.64
bane
-0.63
agos
-0.62
oldemort
-0.57
ysc
-0.57
iltr
-0.56
POSITIVE LOGITS
st
0.84
IJ
0.83
among
0.69
others
0.69
ī
0.69
ĪĴ
0.69
Īè
0.68
stad
0.67
¸
0.66
whom
0.64
Activations Density 0.027%