INDEX
Explanations
references to authors or creators and their works in a meta-textual context
New Auto-Interp
Negative Logits
endir
-0.17
esian
-0.14
orne
-0.14
Ca
-0.14
roat
-0.13
eward
-0.13
metic
-0.13
義
-0.13
ata
-0.13
[--
-0.13
POSITIVE LOGITS
itself
0.31
themselves
0.22
herself
0.21
meta
0.19
kendisi
0.19
èĩªèº«
0.18
meta
0.17
selbst
0.17
Himself
0.17
(meta
0.16
Activations Density 0.319%