INDEX
Explanations
highly negative or controversial statements/actions in text related to schools, race, international conflicts, and political figures
New Auto-Interp
Negative Logits
RAFT
-0.78
rex
-0.67
romy
-0.63
rn
-0.63
midt
-0.61
staking
-0.61
iership
-0.61
romeda
-0.61
(){-0.61
ean
-0.60
POSITIVE LOGITS
xious
1.31
longer
1.10
doubt
1.07
matter
1.01
except
1.00
shortage
0.94
ct
0.94
indication
0.89
obs
0.87
discern
0.86
Activations Density 0.831%