INDEX
Explanations
phrases that convey important actions or decisions impacting groups or communities
New Auto-Interp
Negative Logits
ulant
-0.16
ueur
-0.15
ิà¸ļ
-0.14
Intercept
-0.14
gers
-0.13
äºİæĺ¯
-0.13
nth
-0.13
.IContainer
-0.13
ifacts
-0.13
itorio
-0.13
POSITIVE LOGITS
ihn
0.20
Them
0.17
sie
0.16
sie
0.16
THEM
0.16
eux
0.16
alla
0.15
ergus
0.15
åħ¶
0.15
them
0.15
Activations Density 0.172%