INDEX
Explanations
phrases related to hindering or preventing something
concepts related to prevention or deterrence
New Auto-Interp
Negative Logits
olini
-0.76
rooft
-0.71
Patriarch
-0.70
ioch
-0.69
ocalypse
-0.69
oln
-0.66
ocene
-0.65
enhagen
-0.65
halls
-0.64
oway
-0.64
POSITIVE LOGITS
ministic
1.82
minist
1.51
rence
1.02
ior
0.97
gent
0.96
red
0.91
ried
0.89
ring
0.87
ply
0.86
rer
0.84
Activations Density 0.025%