INDEX
Explanations
mentions of violent or controversial events
references to the term "bath" in various contexts
New Auto-Interp
Negative Logits
Ob
-0.82
IFA
-0.72
RY
-0.70
BT
-0.70
HI
-0.70
Democr
-0.69
APD
-0.69
MAN
-0.69
ICT
-0.68
KI
-0.68
POSITIVE LOGITS
â̦)
0.84
â̦
0.78
estyles
0.75
ãĤ¼ãĤ¦ãĤ¹
0.70
...)
0.68
swall
0.67
FML
0.66
â̦"
0.64
estead
0.63
hers
0.63
Activations Density 0.000%