INDEX
Explanations
words related to negative impacts or setbacks
references to negative impacts or setbacks
New Auto-Interp
Negative Logits
ript
-0.74
cius
-0.73
ately
-0.69
RY
-0.66
facult
-0.66
uana
-0.65
nesota
-0.64
ordan
-0.64
ently
-0.64
rosse
-0.63
POSITIVE LOGITS
hole
1.03
gun
1.00
guns
0.95
holes
0.95
job
0.92
blow
0.89
jobs
0.88
outs
0.87
waves
0.87
out
0.87
Activations Density 0.011%