INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
orama
-0.19
opi
-0.18
fram
-0.16
ften
-0.15
McKenzie
-0.15
bc
-0.15
emax
-0.15
illum
-0.15
apon
-0.15
å´İ
-0.15
POSITIVE LOGITS
Couch
0.16
ÑĦиÑĨи
0.15
_simps
0.15
Tent
0.14
-UA
0.14
ueva
0.14
anky
0.14
-fi
0.14
hold
0.14
ugi
0.14
Activations Density 0.102%