INDEX
Explanations
instances of the word "to"
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.06
3:0.05
4:0.16
5:0.02
6:0.23
7:0.07
8:0.04
9:0.03
10:0.09
11:0.16
Negative Logits
lins
-1.36
Clarkson
-1.27
Aless
-1.27
understatement
-1.26
routed
-1.25
Levant
-1.23
Fired
-1.23
newsletter
-1.23
Heist
-1.22
Emin
-1.21
POSITIVE LOGITS
ascript
1.57
asus
1.56
ukong
1.50
Rat
1.47
Flavoring
1.46
sites
1.42
do
1.40
annabin
1.33
ritic
1.32
develop
1.31
Activations Density 0.001%