INDEX
Explanations
proper nouns, especially names and locations
New Auto-Interp
Negative Logits
-Russian
-0.16
/read
-0.15
awah
-0.15
ramifications
-0.15
rigorous
-0.15
ight
-0.15
routine
-0.15
ights
-0.15
liers
-0.14
Body
-0.14
POSITIVE LOGITS
efined
0.24
asmus
0.20
ourke
0.20
ansom
0.18
cliffe
0.18
entrant
0.17
nant
0.17
aldo
0.16
uffles
0.16
naissance
0.16
Activations Density 1.405%