INDEX
Explanations
references to environmental activism and political efforts
New Auto-Interp
Negative Logits
Specifier
-0.18
ÌĢ
-0.17
pcm
-0.17
ãĤ
-0.17
iginal
-0.16
.Interop
-0.15
IGNORE
-0.14
ÏĦηÏĥη
-0.14
ربÙĩ
-0.14
emann
-0.14
POSITIVE LOGITS
ourselves
0.27
our
0.27
yourselves
0.25
our
0.21
ours
0.20
æĪij们çļĦ
0.18
nossa
0.18
nuestra
0.17
наÑĪей
0.17
nosso
0.17
Activations Density 0.108%