INDEX
Explanations
instances of the word "wrote" indicating authorship or creating written content
New Auto-Interp
Negative Logits
agara
-0.73
angular
-0.72
omical
-0.69
abol
-0.68
ĪĴ
-0.67
acles
-0.67
ILCS
-0.67
phant
-0.67
obyl
-0.66
nostic
-0.65
POSITIVE LOGITS
sarcast
0.90
eloqu
0.79
rhet
0.78
letters
0.75
letter
0.74
wrote
0.72
scathing
0.72
quoting
0.72
furiously
0.71
caption
0.70
Activations Density 0.019%