INDEX
Explanations
punctuation and formatting cues within the text
New Auto-Interp
Negative Logits
orio
-0.19
-Sah
-0.17
ichten
-0.15
icio
-0.14
osc
-0.14
sembly
-0.14
unct
-0.14
nomine
-0.14
cate
-0.14
vide
-0.14
POSITIVE LOGITS
_Generic
0.17
generic
0.17
purchase
0.17
Wend
0.16
0.15
order
0.15
0.15
_purchase
0.15
Fry
0.14
Autos
0.14
Activations Density 0.005%