INDEX
Explanations
verbal mentions of specific information sources
instances of the word "cite."
New Auto-Interp
Negative Logits
WINDOWS
-0.70
ALP
-0.69
normal
-0.67
OOOO
-0.64
SOLD
-0.64
Extras
-0.63
Ultra
-0.63
Shadows
-0.61
fighting
-0.60
Error
-0.59
POSITIVE LOGITS
cite
3.37
cites
2.51
cited
1.44
citing
1.40
Citation
1.30
mention
1.14
recite
1.08
disproportionate
1.04
citations
1.03
mentions
1.01
Activations Density 0.017%