INDEX
Explanations
statements or references from sources as evidence or support
references or citations in a text
New Auto-Interp
Negative Logits
orld
-0.84
ipeg
-0.79
amination
-0.72
quer
-0.70
visors
-0.69
ifix
-0.69
vic
-0.68
sembly
-0.68
ct
-0.68
xy
-0.67
POSITIVE LOGITS
citations
0.90
warnings
0.87
scriptures
0.77
examples
0.75
sources
0.74
excuses
0.72
citation
0.70
citing
0.70
unnamed
0.69
quotes
0.67
Activations Density 0.037%