INDEX
Explanations
phrases emphasizing collective experiences and shared identities
New Auto-Interp
Negative Logits
出版年
-0.69
ărilor
-0.65
LETTER
-0.63
"/",
-0.59
ITTEN
-0.59
messer
-0.59
cba
-0.58
éron
-0.58
ormány
-0.57
للاسماء
-0.57
POSITIVE LOGITS
everyone
1.04
Everyone
0.99
everybody
0.97
everyone
0.96
Everyone
0.96
Everybody
0.92
Everybody
0.91
everybody
0.88
EVERYONE
0.86
Tutti
0.66
Activations Density 0.149%