INDEX
Explanations
instances of the word "leverage"
terms related to influence or power dynamics
New Auto-Interp
Negative Logits
ike
-0.85
nen
-0.79
olog
-0.78
rt
-0.74
ART
-0.71
ago
-0.69
liam
-0.69
ridge
-0.68
nee
-0.68
abol
-0.65
POSITIVE LOGITS
leverage
1.37
levers
1.09
ibly
0.91
leveraging
0.91
lever
0.84
clout
0.83
ible
0.82
ibility
0.80
IBLE
0.79
microphones
0.77
Activations Density 0.007%