INDEX
Explanations
numerical values or statistics within the text
New Auto-Interp
Negative Logits
dk
-0.15
bob
-0.15
anol
-0.14
Jako
-0.14
:&
-0.14
.
-0.14
nicos
-0.14
.nlm
-0.13
rc
-0.13
iane
-0.13
POSITIVE LOGITS
6
0.25
9
0.25
8
0.24
7
0.24
13
0.24
5
0.23
12
0.23
15
0.23
11
0.23
14
0.22
Activations Density 0.067%