INDEX
Explanations
the presence of proper nouns and titles in the text
New Auto-Interp
Negative Logits
wich
-0.16
ombo
-0.16
modifiable
-0.15
umbo
-0.14
wards
-0.14
653
-0.14
ensis
-0.14
iron
-0.13
dy
-0.13
freeze
-0.13
POSITIVE LOGITS
ãĥ¼ãĥĢ
0.18
ugi
0.16
æ¸Ī
0.15
ubl
0.14
icion
0.14
iline
0.14
lerde
0.14
é
0.13
ewan
0.13
igen
0.13
Activations Density 0.707%