INDEX
Explanations
words that signal the beginning of an article or a piece of writing
instances of attribution or authorship in texts
New Auto-Interp
Negative Logits
itives
-0.82
atron
-0.71
pains
-0.71
ioxide
-0.67
ickets
-0.65
ptions
-0.65
isable
-0.64
hement
-0.64
stones
-0.62
enthal
-0.61
POSITIVE LOGITS
akuya
0.89
virtue
0.83
contrast
0.79
catch
0.71
stand
0.69
pass
0.68
implication
0.67
Hilbert
0.66
cloneembedreportprint
0.66
JUL
0.64
Activations Density 0.025%