INDEX
Explanations
proper nouns, particularly names of individuals or entities
New Auto-Interp
Negative Logits
betweenstory
-1.06
Personendaten
-1.00
aarrggbb
-0.92
DoubleQuotes
-0.92
CURIAM
-0.90
itſelf
-0.90
تضيفلها
-0.87
writeField
-0.86
виправивши
-0.85
kaarangay
-0.84
POSITIVE LOGITS
o
0.45
m
0.44
w
0.42
se
0.40
black
0.39
şi
0.39
Gew
0.39
p
0.39
o
0.38
la
0.38
Activations Density 0.007%