INDEX
Explanations
references to murder and related criminal activities
New Auto-Interp
Negative Logits
ÑģÑĭлки
-0.15
ird
-0.15
lage
-0.15
asser
-0.14
381
-0.14
ÑģÑĭл
-0.14
REMOTE
-0.14
oha
-0.14
ersistence
-0.14
óln
-0.14
POSITIVE LOGITS
ously
0.23
abilia
0.22
ous
0.22
-su
0.21
spree
0.19
joy
0.17
scenes
0.16
Scenes
0.16
rio
0.15
ÏĥÏĦη
0.15
Activations Density 0.027%