INDEX
Explanations
words related to negative or insulting behavior
negative or derogatory terms related to individuals
New Auto-Interp
Negative Logits
spring
-0.76
Maintenance
-0.67
Source
-0.66
Source
-0.63
PowerPoint
-0.62
Accuracy
-0.62
Fathers
-0.61
Annotations
-0.61
Spiritual
-0.60
Authority
-0.60
POSITIVE LOGITS
jer
1.26
jerk
1.10
usalem
1.02
boa
1.01
ometer
0.88
¶ħ
0.86
nesday
0.85
icho
0.84
ometers
0.81
bucks
0.81
Activations Density 0.007%