INDEX
Explanations
verbs expressing actions or comparisons
phrases that describe analogies or comparisons involving actions
New Auto-Interp
Negative Logits
ersen
-0.71
ses
-0.63
ifer
-0.62
barring
-0.62
avage
-0.61
ère
-0.60
showing
-0.59
OUND
-0.59
lease
-0.59
ending
-0.58
POSITIVE LOGITS
oneself
1.22
Yourself
0.93
yourself
0.82
Pengu
0.69
strangers
0.66
toget
0.65
immersed
0.64
ocial
0.62
ãĥ³ãĤ¸
0.61
omorphic
0.61
Activations Density 0.362%