INDEX
Explanations
phrases related to taking decisive actions or making significant decisions
phrases that imply extraction or removal
New Auto-Interp
Negative Logits
orously
-0.83
orld
-0.78
ould
-0.73
esa
-0.71
staking
-0.70
shaw
-0.68
pton
-0.67
umbered
-0.67
etimes
-0.65
ingham
-0.64
POSITIVE LOGITS
stretched
0.98
levers
0.74
wards
0.72
rage
0.69
doors
0.67
ta
0.65
casts
0.63
microphones
0.63
stitches
0.62
WARD
0.62
Activations Density 0.028%