INDEX
Explanations
first instance
first item or action
New Auto-Interp
Negative Logits
\
1.88
'
1.75
:
1.71
-
1.57
?
1.55
"
1.48
)
1.46
(
1.42
;
1.41
,
1.37
POSITIVE LOGITS
s
1.59
first
1.51
to
1.43
as
1.26
in
1.19
named
1.19
not
1.17
default
1.16
text
1.14
underwear
1.14
Activations Density 0.596%