INDEX
Explanations
specific references or citations within a text
references to citing sources or evidence
New Auto-Interp
Negative Logits
orld
-0.70
ascript
-0.69
ipeg
-0.68
ixtape
-0.66
cum
-0.66
olen
-0.65
cas
-0.65
hattan
-0.65
cos
-0.64
quer
-0.64
POSITIVE LOGITS
citations
0.96
citation
0.88
enza
0.86
quotes
0.80
cite
0.80
warnings
0.80
âĨij
0.79
cited
0.78
citing
0.77
="#
0.75
Activations Density 0.022%