INDEX
Explanations
references or mentions of specific sources or citations in a text
instances of the word "cite" and its variations, indicating references or attributions in text
New Auto-Interp
Negative Logits
ipeg
-0.73
¯¯¯¯¯¯¯¯
-0.73
visors
-0.69
tackle
-0.67
shake
-0.67
hattan
-0.67
ixtape
-0.67
xy
-0.66
ichick
-0.65
spawn
-0.65
POSITIVE LOGITS
citations
0.86
scriptures
0.81
warnings
0.80
excuses
0.78
anity
0.71
aloud
0.70
ibly
0.70
phrases
0.69
scripture
0.69
References
0.68
Activations Density 0.036%