INDEX
Explanations
terms indicating collective identity or inclusivity
New Auto-Interp
Negative Logits
境的
-0.68
先の
-0.66
});
-0.66
:'/
-0.65
@"";
-0.65
ないで
-0.63
先に
-0.63
an
-0.62
vos
-0.62
σ
-0.61
POSITIVE LOGITS
everyone
2.34
everyone
2.29
Everyone
2.21
Everyone
2.21
everybody
2.17
everybody
2.14
Everybody
2.10
EVERYONE
2.04
Everybody
2.04
everything
1.57
Activations Density 0.033%