INDEX
Explanations
phrases indicating varying levels of ease or difficulty for different tasks
phrases that describe the ease or difficulty of tasks
New Auto-Interp
Negative Logits
agnar
-0.65
vernment
-0.65
aldo
-0.61
eor
-0.60
uclear
-0.60
older
-0.59
overe
-0.59
Patri
-0.58
arling
-0.58
arlane
-0.58
POSITIVE LOGITS
chore
0.75
.?
0.74
.–
0.72
.ãĢį
0.70
¶
0.68
fraught
0.68
âĢº
0.67
.",
0.66
bie
0.66
.
0.65
Activations Density 0.323%