INDEX
Explanations
the presence of the word "most" and related phrases indicating the majority
New Auto-Interp
Negative Logits
riwal
-0.63
leyeb
-0.58
נה
-0.58
druk
-0.57
Be
-0.56
anueva
-0.53
zeitung
-0.53
lamabad
-0.52
beqa
-0.52
'])){
-0.51
POSITIVE LOGITS
maioria
1.08
meisten
1.07
plupart
1.07
большинство
1.06
flesta
1.03
большин
0.98
Mostly
0.96
çoğu
0.95
meeste
0.94
mayoría
0.94
Activations Density 0.226%