INDEX
Explanations
references to the second person "you" in various contexts
New Auto-Interp
Negative Logits
their
-0.50
檚
-0.48
她们
-0.46
Their
-0.45
Leurs
-0.45
她們
-0.44
leurs
-0.44
Their
-0.42
他們的
-0.41
他们的
-0.39
POSITIVE LOGITS
guys
1.54
yourself
1.32
yourselves
1.02
guys
0.96
yourself
0.94
tubers
0.94
GUYS
0.90
Guys
0.85
Guys
0.82
Yourself
0.82
Activations Density 0.199%