INDEX
Explanations
words related to detention or confinement
New Auto-Interp
Negative Logits
xual
-0.88
manship
-0.83
ocene
-0.79
#$
-0.74
nown
-0.72
proportions
-0.71
Weasley
-0.69
bold
-0.68
inventoryQuantity
-0.68
fallacy
-0.68
POSITIVE LOGITS
ention
1.34
ailed
1.26
rans
1.23
ected
1.19
roit
1.18
ector
1.17
ective
1.13
achable
1.12
ection
1.09
ainer
1.08
Activations Density 0.047%