INDEX
Explanations
references to specific years and events in history
New Auto-Interp
Negative Logits
поба
-0.19
urum
-0.18
завиÑģим
-0.17
reich
-0.16
ignum
-0.15
eldre
-0.14
macros
-0.14
unning
-0.14
avaÅŁ
-0.14
Lucas
-0.14
POSITIVE LOGITS
na
0.28
tu
0.25
w
0.19
Tu
0.19
Tu
0.18
przez
0.18
już
0.18
po
0.17
tu
0.17
Na
0.17
Activations Density 0.039%