INDEX
Explanations
phrases related to negative impacts or setbacks
phrases indicating negative impacts or consequences
New Auto-Interp
Negative Logits
ript
-0.81
iosity
-0.68
RY
-0.67
uana
-0.67
cius
-0.67
Malays
-0.67
ively
-0.65
phis
-0.63
âĸ¬
-0.62
ately
-0.62
POSITIVE LOGITS
hole
1.04
gun
0.93
holes
0.92
job
0.90
hard
0.90
pipe
0.88
blow
0.88
jobs
0.88
outs
0.87
waves
0.86
Activations Density 0.021%