INDEX
Explanations
actions related to buying, playing, or making decisions
abilities and actions
New Auto-Interp
Negative Logits
y
-0.51
ho
-0.44
</strong>
-0.40
ya
-0.39
ia
-0.39
ds
-0.38
?
-0.38
enson
-0.37
<eos>
-0.36
del
-0.36
POSITIVE LOGITS
canst
0.89
ロウィン
0.68
queſta
0.66
pouvoit
0.66
AndEndTag
0.65
potest
0.65
ſelves
0.64
outheast
0.63
[@BOS@]
0.62
<unused8>
0.62
Activations Density 0.076%