INDEX
Explanations
references to criminal behavior and legal cases involving suspects
New Auto-Interp
Negative Logits
dden
-0.18
ucch
-0.16
iba
-0.15
inas
-0.15
ÙĦÙħÙĩ
-0.15
fallen
-0.15
uentes
-0.14
Fallen
-0.14
689
-0.14
fal
-0.14
POSITIVE LOGITS
amb
0.24
slit
0.24
targeted
0.19
committed
0.19
attacked
0.18
targeting
0.18
corner
0.18
target
0.18
prey
0.17
COMMIT
0.17
Activations Density 0.084%