INDEX
Explanations
names of characters in movies or TV series
instances of opening parentheses in text
New Auto-Interp
Negative Logits
retard
-0.87
nutrient
-0.75
parity
-0.73
nuts
-0.72
lull
-0.72
depress
-0.71
irds
-0.70
electr
-0.69
hazards
-0.69
olicy
-0.68
POSITIVE LOGITS
sic
1.51
formerly
1.43
pictured
1.36
aka
1.29
?)
1.28
via
1.26
?),
1.20
who
1.19
!)
1.17
which
1.17
Activations Density 0.183%