INDEX
Explanations
instances of the word "to"
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.05
3:0.08
4:0.08
5:0.03
6:0.21
7:0.23
8:0.03
9:0.02
10:0.07
11:0.10
Negative Logits
Loren
-1.31
aretz
-1.31
Worth
-1.30
ylum
-1.28
totaled
-1.24
Hammond
-1.23
Huck
-1.22
Remy
-1.22
ibe
-1.22
cause
-1.20
POSITIVE LOGITS
paren
1.67
disg
1.65
behavi
1.50
conting
1.48
prepar
1.41
toile
1.36
iggins
1.36
resil
1.32
dayName
1.30
councill
1.28
Activations Density 0.003%