INDEX
Explanations
phrases related to the need for citations or references
references to citations or sources in the text
New Auto-Interp
Negative Logits
pora
-0.72
inav
-0.70
milo
-0.68
deals
-0.65
graduate
-0.64
Jet
-0.62
wives
-0.61
manship
-0.61
quer
-0.60
ratulations
-0.59
POSITIVE LOGITS
needed
0.95
=]
0.87
omitted
0.85
?]
0.78
needed
0.77
][
0.76
redacted
0.72
]"
0.72
requested
0.71
needs
0.70
Activations Density 0.042%