INDEX
Explanations
references to specific years and historical milestones
New Auto-Interp
Negative Logits
lisi
-0.17
並
-0.14
rei
-0.14
raud
-0.14
aser
-0.14
aru
-0.14
preferredStyle
-0.14
ruba
-0.13
atives
-0.13
utral
-0.13
POSITIVE LOGITS
when
0.23
when
0.21
cuando
0.16
khi
0.16
When
0.15
عÙĨدÙħا
0.15
Ñıд
0.15
Ïĥε
0.15
ØŃÙĬÙĨ
0.15
quando
0.15
Activations Density 0.069%