INDEX
Explanations
phrases where the concept of "sameness" or similarity is emphasized
phrases that express similarity or equality
New Auto-Interp
Negative Logits
WI
-0.73
rollers
-0.72
Ķ
-0.71
rection
-0.70
orts
-0.69
ronics
-0.69
arest
-0.68
rosso
-0.68
rique
-0.67
xtap
-0.67
POSITIVE LOGITS
thing
1.29
way
1.00
exact
0.89
amount
0.88
size
0.86
kind
0.85
extent
0.81
anymore
0.81
regardless
0.81
sort
0.80
Activations Density 0.045%