INDEX
Explanations
attends to the duration token "LENGTH" from the final token "SHORT."
New Auto-Interp
Head Attr Weights
0:0.11
1:0.18
2:0.20
3:0.06
4:0.10
5:0.06
6:0.07
7:0.17
Negative Logits
клопе
-0.36
AssemblyTitle
-0.28
⊱
-0.28
anthene
-0.26
ft
-0.26
cie
-0.25
npmjs
-0.25
breakthroughs
-0.24
𝗳
-0.24
kämp
-0.24
POSITIVE LOGITS
purpoſe
0.43
houſe
0.41
beſt
0.41
Jefus
0.41
ValuePair
0.41
myſelf
0.40
Efq
0.40
Reſ
0.40
يتيمه
0.39
ſtate
0.38
Activations Density 0.002%