INDEX
Explanations
references to specific authors or citations in the text
New Auto-Interp
Negative Logits
anium
-0.15
Åį
-0.15
agua
-0.14
äft
-0.14
ovo
-0.14
uestas
-0.14
éru
-0.14
acam
-0.14
езда
-0.14
Tomáš
-0.13
POSITIVE LOGITS
et
0.29
̧
0.18
08
0.18
06
0.18
201
0.18
05
0.17
07
0.17
iÄĩ
0.17
09
0.16
03
0.16
Activations Density 0.156%