INDEX
Explanations
phrases indicating a sequence or order of actions
phrases indicating action or intention
New Auto-Interp
Head Attr Weights
0:0.11
1:0.04
2:0.06
3:0.08
4:0.06
5:0.12
6:0.07
7:0.08
8:0.09
9:0.07
10:0.13
11:0.05
Negative Logits
\":
-0.97
andowski
-0.90
obstruction
-0.90
Sidd
-0.88
spoken
-0.88
Penn
-0.87
Pod
-0.86
genius
-0.85
#$
-0.84
keyword
-0.84
POSITIVE LOGITS
livion
1.26
byss
1.23
ensis
1.11
amus
0.99
ォ
0.98
utics
0.97
iris
0.95
cules
0.95
utic
0.94
itia
0.93
Activations Density 0.244%