INDEX
Explanations
comparisons or similarities between different concepts
phrases that indicate similarity comparisons
New Auto-Interp
Negative Logits
stoked
-0.69
tun
-0.67
raq
-0.67
danced
-0.63
gered
-0.63
resy
-0.62
helicop
-0.61
uve
-0.60
contrace
-0.60
transitioned
-0.60
POSITIVE LOGITS
ours
0.84
lihood
0.78
oxide
0.75
ffee
0.71
rium
0.69
èª
0.67
theirs
0.65
those
0.62
the
0.62
traditional
0.62
Activations Density 0.174%