INDEX
Explanations
references to gatherings or social events
New Auto-Interp
Negative Logits
¤ëĭ¤
-0.16
دارÙħ
-0.15
دارد
-0.15
hebt
-0.14
ãģijãģªãģĦ
-0.14
ÑģÑĥÑīеÑģÑĤвÑĥеÑĤ
-0.14
iddi
-0.14
завиÑģим
-0.14
ÑĭваеÑĤÑģÑı
-0.14
аеÑĤÑģÑı
-0.14
POSITIVE LOGITS
were
0.90
were
0.76
Were
0.74
Were
0.72
weren
0.65
бÑĭли
0.64
waren
0.59
بÙĪØ¯ÙĨد
0.57
étaient
0.55
бÑĥли
0.53
Activations Density 0.466%