INDEX
Explanations
instances of parentheses and mentions of the first-person perspective
New Auto-Interp
Negative Logits
of
-0.19
(
-0.19
&
-0.17
-0.17
__
-0.16
iff
-0.16
,
-0.16
or
-0.16
),
-0.16
*
-0.15
POSITIVE LOGITS
â̦)↵↵
0.19
Uvs
0.17
úsqueda
0.17
edir
0.17
ebek
0.16
↵↵
0.16
HITE
0.16
Note
0.15
.yy
0.15
ÐIJÑĢÑħÑĸв
0.15
Activations Density 0.083%