INDEX
Explanations
specific formatting elements or structural markers in the text
New Auto-Interp
Negative Logits
Tone
-0.17
bach
-0.16
ices
-0.16
ntax
-0.15
ACES
-0.15
aget
-0.15
comp
-0.14
Liberty
-0.14
Birch
-0.14
abbage
-0.14
POSITIVE LOGITS
Cooke
0.18
eden
0.15
-terminal
0.15
caff
0.15
hea
0.14
kah
0.14
мени
0.14
anko
0.14
âĸĪâĸĪ
0.14
HOOK
0.14
Activations Density 0.028%