INDEX
Explanations
the repeated use of the word "it"
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.16
3:0.05
4:0.20
5:0.02
6:0.19
7:0.17
8:0.03
9:0.04
10:0.03
11:0.04
Negative Logits
mbuds
-1.76
neys
-1.66
ount
-1.53
cius
-1.51
Klux
-1.46
ront
-1.46
ourses
-1.45
ossom
-1.44
Fram
-1.41
clud
-1.40
POSITIVE LOGITS
TBD
1.45
guesses
1.36
kidding
1.31
nil
1.31
Haku
1.29
Zin
1.29
$_
1.28
guess
1.26
�
1.25
Syl
1.23
Activations Density 0.000%