INDEX
Explanations
the adjective "heavy" followed by a noun
New Auto-Interp
Negative Logits
myſelf
-1.11
'\\;'
-1.07
Theſe
-1.05
་་
-1.02
―――――
-0.99
Efq
-0.96
itſelf
-0.95
Reſ
-0.92
$.
-0.92
ModelExpression
-0.91
POSITIVE LOGITS
a
0.69
an
0.68
↵↵
0.67
;
0.66
n
0.65
<eos>
0.65
.
0.63
the
0.63
with
0.62
:
0.59
Activations Density 0.441%