INDEX
Explanations
phrases related to hindering or preventing something
words associated with prevention or deterrence
New Auto-Interp
Negative Logits
olini
-0.81
ioch
-0.74
rooft
-0.70
ammy
-0.70
halls
-0.68
iop
-0.67
ocalypse
-0.65
Patriarch
-0.65
oway
-0.63
enhagen
-0.63
POSITIVE LOGITS
ministic
1.77
minist
1.42
rence
0.97
gent
0.96
ior
0.95
ring
0.91
ply
0.91
red
0.88
deter
0.85
ple
0.85
Activations Density 0.015%