INDEX
Explanations
attends to moments described as occurring "before" from subsequent tokens conveying actions or states that follow
New Auto-Interp
Head Attr Weights
0:0.12
1:0.52
2:0.07
3:0.05
4:0.05
5:0.06
6:0.02
7:0.07
Negative Logits
testens
-0.44
SequentialGroup
-0.42
Roskov
-0.40
AutoScaleMode
-0.40
sebelum
-0.40
thâu
-0.39
enumi
-0.39
-0.38
PerformLayout
-0.36
EnglishChoose
-0.36
POSITIVE LOGITS
ujednoznacz
0.44
chise
0.40
__*/
0.38
erequisites
0.37
isenberg
0.37
للاسماء
0.35
клопе
0.34
sohn
0.34
purpoſe
0.34
Edited
0.33
Activations Density 0.628%