INDEX
Explanations
phrases that express ease or difficulty
phrases that express simplicity or ease in performing actions
New Auto-Interp
Negative Logits
Dur
-0.81
Relief
-0.69
ELD
-0.68
allowed
-0.68
DAQ
-0.66
Balls
-0.66
cedented
-0.66
favors
-0.64
Brach
-0.62
eting
-0.62
POSITIVE LOGITS
forget
1.18
mistake
1.05
confuse
1.02
miscon
1.01
navigate
1.01
identify
1.00
overlook
0.96
visualize
0.96
learn
0.95
misinterpret
0.95
Activations Density 0.075%