INDEX
Explanations
punctuation marks and specific stylistic elements within the text
New Auto-Interp
Negative Logits
iez
-0.18
amus
-0.15
utan
-0.14
оÑİ
-0.14
antu
-0.13
áž
-0.13
asmus
-0.13
esz
-0.13
asu
-0.13
336
-0.13
POSITIVE LOGITS
icari
0.15
aptive
0.14
ammer
0.14
lear
0.13
Straw
0.13
stras
0.13
til
0.13
_traits
0.13
905
0.13
OTAL
0.13
Activations Density 0.026%