INDEX
Explanations
occurrences of the word "the."
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.09
3:0.09
4:0.08
5:0.08
6:0.08
7:0.07
8:0.07
9:0.08
10:0.08
11:0.09
Negative Logits
Legislation
-2.18
Locke
-1.85
argon
-1.65
Cassidy
-1.60
Comment
-1.58
aceae
-1.54
urated
-1.53
Commentary
-1.51
Liberal
-1.51
kell
-1.50
POSITIVE LOGITS
bum
1.54
uten
1.53
pan
1.52
cum
1.49
pals
1.49
peers
1.48
mast
1.47
ruined
1.44
shakes
1.44
iets
1.44
Activations Density 0.000%