INDEX
Explanations
phrases related to authorship and publication context
New Auto-Interp
Negative Logits
-0.16
just
-0.16
Rig
-0.14
Prel
-0.14
á»ī
-0.14
hor
-0.14
776
-0.14
isu
-0.14
anywhere
-0.14
hor
-0.14
POSITIVE LOGITS
originally
0.23
Originally
0.23
original
0.23
Originally
0.21
/original
0.19
(original
0.19
åİŁ
0.18
original
0.18
оÑĢиг
0.18
nguyên
0.17
Activations Density 0.109%