INDEX
Explanations
occurrences of the word "rob" and its variants, indicating themes of theft or robbery
New Auto-Interp
Negative Logits
bert
-0.20
p
-0.20
z
-0.18
park
-0.18
b
-0.18
pom
-0.18
m
-0.18
pv
-0.18
erton
-0.17
istic
-0.17
POSITIVE LOGITS
sy
0.22
tt
0.21
oster
0.19
ssa
0.18
sin
0.17
ta
0.17
ro
0.17
УкÑĢаÑĹ
0.17
ving
0.16
requestOptions
0.16
Activations Density 0.006%