INDEX
Explanations
incidents of violence and crime
New Auto-Interp
Negative Logits
же
-0.17
kå
-0.15
ourg
-0.15
artık
-0.15
avier
-0.14
imo
-0.14
prostituer
-0.13
optionally
-0.13
oise
-0.13
arkin
-0.13
POSITIVE LOGITS
while
0.36
whilst
0.29
while
0.29
during
0.27
WHILE
0.25
_while
0.25
minutes
0.24
after
0.24
moments
0.23
While
0.23
Activations Density 0.266%