INDEX
Explanations
phrases indicating movement toward a progressive or positive direction
New Auto-Interp
Head Attr Weights
0:0.16
1:0.02
2:0.06
3:0.17
4:0.02
5:0.05
6:0.02
7:0.05
8:0.02
9:0.01
10:0.36
11:0.02
Negative Logits
fame
-2.31
televised
-2.06
��
-2.02
expire
-1.97
notoriety
-1.95
cumbers
-1.93
Serving
-1.87
Ability
-1.84
Recorded
-1.82
residing
-1.81
POSITIVE LOGITS
direction
3.66
wrong
3.35
directions
3.31
oward
3.16
wrong
2.82
correct
2.75
opposite
2.70
towards
2.55
Wrong
2.51
wise
2.49
Activations Density 0.033%