INDEX
Explanations
questioning statements or inquiries about various topics
New Auto-Interp
Negative Logits
them
-0.96
them
-0.84
these
-0.81
这两个
-0.71
These
-0.70
These
-0.70
those
-0.69
Those
-0.68
these
-0.68
you
-0.67
POSITIVE LOGITS
anyone
0.97
anybody
0.89
everyone
0.82
everybody
0.79
anyone
0.75
Anybody
0.74
Anyone
0.67
ANYONE
0.66
Everyone
0.65
Anybody
0.64
Activations Density 0.167%