INDEX
Explanations
references to citations needed within a text
instances of citations or references to sources
New Auto-Interp
Negative Logits
inav
-0.86
pora
-0.82
milo
-0.77
ratulations
-0.71
roups
-0.64
quer
-0.63
ynes
-0.62
gradation
-0.61
wear
-0.61
oshenko
-0.60
POSITIVE LOGITS
=]
1.04
omitted
1.03
]"
0.86
redacted
0.83
footnote
0.78
])
0.76
]),
0.76
],[
0.76
?]
0.75
]
0.74
Activations Density 0.038%