INDEX
Explanations
prevalence proportioning commonality
New Auto-Interp
Negative Logits
femei
-0.93
oameni
-0.85
or
-0.84
devemos
-0.83
agresión
-0.82
fallecimiento
-0.82
ンの
-0.82
attraverso
-0.81
alcuni
-0.79
através
-0.79
POSITIVE LOGITS
(~
1.93
$(
1.89
(
1.80
($
1.73
(<
1.62
(>
1.54
(£
1.30
($\
1.24
(€
1.22
$(
1.19
Activations Density 0.309%