INDEX
Explanations
references to authorship and the act of writing
New Auto-Interp
Negative Logits
illusion
-0.71
apon
-0.70
grounds
-0.65
asia
-0.63
ele
-0.63
eal
-0.62
EMS
-0.61
RS
-0.60
eno
-0.59
pneum
-0.59
POSITIVE LOGITS
memos
0.83
vironment
0.82
letters
0.78
smanship
0.77
penned
0.77
textbooks
0.76
Letters
0.76
LET
0.76
papers
0.76
lishing
0.75
Activations Density 0.028%