INDEX
Explanations
questions and inquiries related to understanding or learning more about a topic
New Auto-Interp
Negative Logits
3
-0.19
4
-0.18
2
-0.18
5
-0.17
6
-0.17
1
-0.17
8
-0.17
11
-0.16
ourselves
-0.16
9
-0.15
POSITIVE LOGITS
yo
0.43
yu
0.43
you
0.40
u
0.36
y
0.36
tou
0.36
ou
0.34
ya
0.32
yp
0.32
Ñĥ
0.30
Activations Density 0.229%