INDEX
Explanations
instances of the letter 'T' and related characters in the text
New Auto-Interp
Negative Logits
tune
-0.17
Ã¥n
-0.17
ietet
-0.15
imoto
-0.14
quent
-0.14
ished
-0.14
gulp
-0.14
gua
-0.14
itored
-0.14
ittest
-0.14
POSITIVE LOGITS
hat
0.34
o
0.33
here
0.31
he
0.30
his
0.29
hey
0.28
hen
0.25
h
0.24
hus
0.24
hat
0.20
Activations Density 0.015%