INDEX
Explanations
articles preceding nouns
New Auto-Interp
Negative Logits
vn
-0.18
ero
-0.17
ose
-0.16
st
-0.15
lf
-0.14
isms
-0.14
zing
-0.14
atics
-0.13
ismo
-0.13
ism
-0.13
POSITIVE LOGITS
lein
0.19
portion
0.16
portions
0.16
existed
0.14
further
0.14
ided
0.14
909
0.14
exists
0.14
upo
0.14
edis
0.14
Activations Density 0.100%