INDEX
Explanations
references to police-related incidents or discussions
New Auto-Interp
Negative Logits
pitch
-0.15
pitch
-0.15
arty
-0.15
Pitt
-0.15
Peters
-0.14
Hatch
-0.14
lit
-0.14
von
-0.14
vom
-0.14
Ernst
-0.14
POSITIVE LOGITS
boru
0.16
/cop
0.15
liš
0.14
uka
0.14
าà¸ģ
0.14
å·¡
0.14
YPD
0.14
udence
0.14
ieber
0.14
ovol
0.14
Activations Density 0.065%