INDEX
Explanations
references to counterparts or equivalents in various contexts
New Auto-Interp
Negative Logits
tics
-0.17
burgh
-0.16
ULAR
-0.15
tic
-0.14
ĤŃ
-0.14
croft
-0.14
BY
-0.14
zin
-0.14
WithOptions
-0.14
omet
-0.13
POSITIVE LOGITS
ies
0.15
jes
0.14
709
0.14
rons
0.14
ages
0.13
916
0.13
(s
0.13
Chan
0.13
sth
0.13
xbd
0.13
Activations Density 0.005%