INDEX
Explanations
connections involving conjunctions and relational phrases
New Auto-Interp
Negative Logits
-0.70
a
-0.60
<eos>
-0.58
(
-0.55
the
-0.53
...
-0.53
,
-0.51
an
-0.51
…
-0.49
No
-0.49
POSITIVE LOGITS
Theſe
1.00
―――――
0.97
Houſe
0.96
✨:
0.93
Reſ
0.90
Anſ
0.89
houſe
0.89
photolibrary
0.89
########.
0.87
ainfi
0.87
Activations Density 0.276%