INDEX
Explanations
phrases related to discussing or exploring the reasons behind the quality or uniqueness of a particular subject
phrases or questions about what constitutes quality or distinctiveness in various contexts
New Auto-Interp
Negative Logits
nery
-0.68
nec
-0.68
Sov
-0.68
loader
-0.67
imation
-0.66
ima
-0.65
jury
-0.63
usterity
-0.62
ALTH
-0.61
imm
-0.59
POSITIVE LOGITS
them
0.90
him
0.84
us
0.84
these
0.78
me
0.76
this
0.71
distinguishes
0.66
THEM
0.65
people
0.62
distinguish
0.61
Activations Density 0.078%