INDEX
Explanations
proper nouns, particularly names and organizational titles
New Auto-Interp
Negative Logits
Jenny
-0.18
ħ
-0.17
uyo
-0.16
atest
-0.16
ahren
-0.16
owski
-0.16
Burnett
-0.15
Weiss
-0.15
arez
-0.15
okino
-0.15
POSITIVE LOGITS
ilion
0.20
§
0.18
982
0.17
Alley
0.17
agher
0.17
Å¡
0.17
Walton
0.17
Foley
0.16
thro
0.16
alion
0.16
Activations Density 0.243%