INDEX
Explanations
capital letters with unusual symbols and low numbers
instances of discourse or communication topics
New Auto-Interp
Negative Logits
exting
-0.83
citiz
-0.80
newcom
-0.78
tremend
-0.78
undermin
-0.77
eleph
-0.76
aditional
-0.75
exha
-0.74
senal
-0.74
proport
-0.73
POSITIVE LOGITS
↵
0.86
SPONSORED
0.82
PHOTOS
0.80
Scroll
0.76
NPR
0.74
Specifically
0.70
Eh
0.68
DragonMagazine
0.68
iak
0.68
sonian
0.67
Activations Density 0.541%