INDEX
Explanations
phrases indicating determination or effort to accomplish a task
phrases expressing determination and effort
New Auto-Interp
Negative Logits
UR
-0.67
Politics
-0.67
eur
-0.63
bent
-0.61
EUR
-0.59
prints
-0.58
vard
-0.57
rejection
-0.57
IB
-0.57
Ancest
-0.57
POSITIVE LOGITS
't
1.08
berra
1.08
muster
1.04
feas
1.02
afford
1.00
adian
0.90
NOT
0.86
emulate
0.86
capitalize
0.83
nesota
0.83
Activations Density 0.068%