INDEX
Explanations
occurrences of violence or weapon-related themes
New Auto-Interp
Negative Logits
Verfüg
-0.15
Gap
-0.14
Mul
-0.14
mul
-0.14
ureau
-0.14
erval
-0.14
gå
-0.14
pog
-0.14
alfa
-0.13
ittest
-0.13
POSITIVE LOGITS
found
0.23
found
0.23
discovery
0.22
-find
0.21
finds
0.21
discovering
0.20
discovered
0.20
find
0.19
discovers
0.19
discoveries
0.19
Activations Density 0.121%