INDEX
Explanations
proper nouns, particularly names and brands
New Auto-Interp
Negative Logits
endra
-0.15
yny
-0.14
arrant
-0.14
Laden
-0.14
optera
-0.14
staking
-0.14
ertz
-0.14
estate
-0.14
earing
-0.14
Ir
-0.13
POSITIVE LOGITS
s
0.20
shaw
0.16
uck
0.15
illo
0.15
loe
0.14
immel
0.14
izz
0.14
nel
0.14
nyder
0.14
ont
0.14
Activations Density 0.033%