INDEX
Explanations
references to numerical values or quantities
New Auto-Interp
Negative Logits
hel
-0.16
bor
-0.15
ozo
-0.15
.Components
-0.15
punch
-0.15
iju
-0.14
bour
-0.14
ILA
-0.14
важа
-0.14
utton
-0.14
POSITIVE LOGITS
eworld
0.16
iggins
0.14
icer
0.14
egl
0.14
ucas
0.14
erville
0.14
Į
0.13
idot
0.13
erton
0.13
елен
0.13
Activations Density 0.033%