INDEX
Explanations
attends to the same or similar tokens from preceding different tokens
New Auto-Interp
Head Attr Weights
0:0.05
1:0.09
2:0.07
3:0.12
4:0.44
5:0.05
6:0.07
7:0.06
Negative Logits
ști
-0.26
Strickland
-0.24
thước
-0.23
vertes
-0.23
sk
-0.23
b
-0.22
jokingly
-0.22
kế
-0.21
justement
-0.21
vode
-0.21
POSITIVE LOGITS
SequentialGroup
0.37
ostavi
0.34
StoryboardSegue
0.34
'\\;'
0.32
GenerationType
0.31
المعيارى
0.31
Réponses
0.31
+:+
0.30
Aiheesta
0.30
rrggbb
0.29
Activations Density 0.423%