INDEX
Explanations
references to specific authors or sources at the beginning of text excerpts
phrases that indicate authorship or attribution in texts
New Auto-Interp
Negative Logits
grip
-0.63
fabric
-0.61
standing
-0.59
listed
-0.58
current
-0.58
right
-0.57
muff
-0.56
hood
-0.56
origin
-0.55
green
-0.55
POSITIVE LOGITS
By
3.25
By
1.95
by
1.83
BY
1.79
Through
1.67
Around
1.48
From
1.41
ccording
1.40
With
1.40
Within
1.39
Activations Density 0.014%