INDEX
Explanations
references to social structure and dynamics within communities
New Auto-Interp
Negative Logits
is
-0.30
isn
-0.28
ÎŃÏĩει
-0.28
دارد
-0.27
ÙĨدارد
-0.25
ÑıвлÑıеÑĤÑģÑı
-0.24
—is
-0.24
Ø®ÙĪØ§Ùĩد
-0.23
has
-0.23
has
-0.22
POSITIVE LOGITS
were
0.91
were
0.73
weren
0.72
Were
0.71
Were
0.66
waren
0.53
бÑĭли
0.53
بÙĪØ¯ÙĨد
0.51
fueron
0.49
wurden
0.47
Activations Density 0.423%