INDEX
Explanations
references or citations in text
references to the act of citing sources or evidence
New Auto-Interp
Negative Logits
orld
-0.81
ipeg
-0.74
¯¯¯¯¯¯¯¯
-0.74
xy
-0.71
hattan
-0.69
ibaba
-0.68
visors
-0.67
ascript
-0.67
amination
-0.66
ixtape
-0.66
POSITIVE LOGITS
citations
0.95
citation
0.87
enza
0.81
cite
0.78
cited
0.76
citing
0.75
âĨij
0.70
warnings
0.70
Cit
0.70
Forbidden
0.70
Activations Density 0.019%