INDEX
Explanations
phrases indicating similarity or comparison
phrases that indicate similarity or comparison
New Auto-Interp
Negative Logits
oust
-0.89
alt
-0.89
utical
-0.88
rax
-0.86
otype
-0.84
inion
-0.84
itles
-0.83
otypes
-0.79
rouse
-0.78
oscope
-0.77
POSITIVE LOGITS
somebody
0.88
lier
0.86
someone
0.86
something
0.84
fireworks
0.83
everybody
0.79
everyone
0.79
goodbye
0.78
they
0.77
fun
0.77
Activations Density 0.042%