INDEX
Explanations
terms related to negative events or actions with significant impact
instances of the word "blow" in various contexts
New Auto-Interp
Negative Logits
ript
-0.89
Malays
-0.68
Beir
-0.65
iosity
-0.65
nesota
-0.65
kson
-0.64
ively
-0.63
Qian
-0.62
Norn
-0.61
herty
-0.60
POSITIVE LOGITS
hard
1.17
gun
1.08
guns
1.03
job
1.01
blow
0.96
pipe
0.94
jobs
0.94
out
0.93
hole
0.92
outs
0.91
Activations Density 0.022%