INDEX
Explanations
words related to opposition or negation
words and concepts related to antisocial behavior or attitudes
New Auto-Interp
Negative Logits
Duchess
-0.78
Falls
-0.75
Dynamics
-0.70
Warfare
-0.67
Transaction
-0.67
IRO
-0.67
Chiefs
-0.66
Penet
-0.66
Bundy
-0.66
Glacier
-0.64
POSITIVE LOGITS
pace
1.22
paces
1.19
ocial
1.17
earch
1.07
terday
1.00
chool
0.98
chwitz
0.98
peed
0.97
leep
0.97
cript
0.97
Activations Density 0.028%