INDEX
Explanations
proper nouns and significant names in a variety of contexts
New Auto-Interp
Negative Logits
IF
-0.17
WHO
-0.16
SEE
-0.16
SEE
-0.15
zilla
-0.15
eteor
-0.15
adaÅŁ
-0.15
ewise
-0.15
gi
-0.15
-0.15
POSITIVE LOGITS
¨
0.17
geçen
0.15
etat
0.14
inge
0.14
gắn
0.14
ubb
0.14
onga
0.14
tether
0.14
ãĤ±ãĥĥãĥĪ
0.14
anking
0.14
Activations Density 0.173%