INDEX
Explanations
mentions of specific individuals involved in political controversies or crimes
New Auto-Interp
Negative Logits
laps
-0.66
Clicker
-0.66
FACE
-0.65
smarter
-0.62
nesday
-0.60
é¾įå¥ij士
-0.59
programmed
-0.57
herein
-0.57
glers
-0.56
Replay
-0.56
POSITIVE LOGITS
igans
1.18
thal
1.00
igan
0.98
azard
0.93
ov
0.91
istic
0.88
agan
0.88
istan
0.87
ovember
0.86
irez
0.86
Activations Density 0.026%