INDEX
Explanations
discussions related to accountability and responsibility
references to significant events, actions, or dynamics involving individuals or groups
New Auto-Interp
Negative Logits
Notting
-0.61
CCTV
-0.59
Surrey
-0.58
Canaver
-0.58
MDMA
-0.56
ibaba
-0.54
Loud
-0.54
Slater
-0.53
Kabul
-0.53
itsch
-0.52
POSITIVE LOGITS
Others
1.10
ones
0.96
those
0.93
others
0.89
exceptions
0.87
these
0.84
Others
0.83
Similarly
0.80
Other
0.77
replacements
0.77
Activations Density 1.229%