INDEX
Explanations
punctuation marks and quotes in the text
New Auto-Interp
Negative Logits
Datuak
-0.66
myſelf
-0.62
pleaſure
-0.59
fubject
-0.58
Majefty
-0.55
purpoſe
-0.55
juſ
-0.55
itſelf
-0.54
poffible
-0.54
الحره
-0.52
POSITIVE LOGITS
)))),
0.68
')),
0.65
})),
0.63
))),
0.62
])),
0.62
")),
0.60
)))));
0.58
)))))
0.58
']),
0.57
())),
0.57
Activations Density 0.013%