INDEX
Explanations
author attribution in a text
phrases indicating authorship or attribution in written content
New Auto-Interp
Negative Logits
itives
-0.82
Finish
-0.72
isable
-0.68
اÙĦ
-0.67
vation
-0.67
SPONSORED
-0.64
ioxide
-0.64
MpServer
-0.62
Insp
-0.60
pains
-0.58
POSITIVE LOGITS
akuya
1.08
contrast
0.96
Hilbert
0.86
virtue
0.83
stand
0.81
pass
0.78
catch
0.75
tes
0.73
tom
0.73
ron
0.72
Activations Density 0.029%