INDEX
Explanations
phrases related to societal rules and governance
New Auto-Interp
Negative Logits
çļĦ人
-0.18
erval
-0.16
ãģ®äºº
-0.16
iego
-0.15
ibt
-0.15
appa
-0.15
Ñĩий
-0.15
edla
-0.14
iedo
-0.14
ÙħاÙħ
-0.14
POSITIVE LOGITS
them
0.18
stroy
0.15
æľīçļĦ
0.15
they
0.14
631
0.14
Ñģами
0.14
634
0.14
703
0.14
ienen
0.14
Affero
0.14
Activations Density 0.289%