INDEX
Explanations
the word "Source" followed by a number
references to "Source" or citations within the text
New Auto-Interp
Negative Logits
akuya
-0.69
haste
-0.67
benches
-0.64
staggered
-0.63
cases
-0.63
atro
-0.63
specials
-0.62
oooooooo
-0.61
onom
-0.61
gall
-0.61
POSITIVE LOGITS
Source
3.93
Source
2.61
Sources
2.15
source
1.96
source
1.92
Sources
1.72
sources
1.51
SOURCE
1.49
OURCE
1.46
ources
1.39
Activations Density 0.006%