INDEX
Explanations
phrases expressing concern or references to historical context and societal issues
New Auto-Interp
Negative Logits
ubar
-0.15
ÑĤÑĢо
-0.14
UED
-0.13
CISION
-0.13
urd
-0.12
zij
-0.12
ulo
-0.12
ãĥ³ãĥģ
-0.12
ãģ£ãģį
-0.12
assen
-0.12
POSITIVE LOGITS
us
1.40
me
0.74
нами
0.67
æĪij们
0.65
-us
0.60
nosotros
0.59
us
0.59
we
0.58
Us
0.57
nous
0.56
Activations Density 1.568%