INDEX
Explanations
the word "the"
New Auto-Interp
Negative Logits
Harness
-0.48
clef
-0.46
Slay
-0.45
Receipt
-0.45
czyna
-0.44
alve
-0.43
Footnote
-0.41
réguli
-0.41
Beet
-0.41
tenis
-0.41
POSITIVE LOGITS
<bos>
3.28
__':
0.90
__":
0.77
/**
0.70
']>
0.70
'
0.68
})}
0.66
'}>
0.65
#
0.64
')):
0.64
Activations Density 0.640%