INDEX
Explanations
names of movies or TV shows
New Auto-Interp
Negative Logits
glers
-0.75
FP
-0.69
REF
-0.65
McCabe
-0.63
gerald
-0.60
Labrador
-0.60
åŃ
-0.59
climate
-0.58
phosphorus
-0.58
в
-0.57
POSITIVE LOGITS
arkable
1.26
nants
1.23
ovable
1.21
oving
1.19
ittance
1.19
arks
1.11
ainer
1.11
edy
1.11
nant
1.10
ainers
1.08
Activations Density 0.010%