INDEX
Explanations
verbs related to actions or experiences
comparisons and analogies to experiences or actions
New Auto-Interp
Negative Logits
ersen
-0.70
ses
-0.67
arat
-0.64
gradation
-0.60
ifi
-0.60
ifer
-0.59
uti
-0.59
endant
-0.58
Reporting
-0.58
quad
-0.58
POSITIVE LOGITS
oneself
1.21
Yourself
0.80
Pengu
0.78
strangers
0.76
yourself
0.73
onym
0.66
é¾įå
0.63
bare
0.62
CTRL
0.61
outdoors
0.59
Activations Density 0.347%