INDEX
Explanations
references to statistics and comparisons regarding different groups or situations
references to the increase in quantifiable phenomena related to societal issues
New Auto-Interp
Negative Logits
Lazarus
-0.74
Abdel
-0.72
CHAT
-0.67
cknow
-0.67
Whip
-0.66
RAW
-0.65
ij士
-0.64
Jinn
-0.63
eston
-0.61
ALSE
-0.60
POSITIVE LOGITS
than
1.54
than
1.23
Than
0.98
erous
0.85
catentry
0.80
efficient
0.77
bang
0.75
hassle
0.73
worthy
0.71
sidx
0.69
Activations Density 0.202%