INDEX
Explanations
words related to long-term planning and objectives
New Auto-Interp
Head Attr Weights
0:0.06
1:0.03
2:0.15
3:0.05
4:0.36
5:0.05
6:0.02
7:0.03
8:0.04
9:0.09
10:0.04
11:0.02
Negative Logits
ogly
-1.47
Steal
-1.45
Holiday
-1.38
kind
-1.36
anan
-1.28
along
-1.28
CHA
-1.27
gal
-1.25
aldi
-1.25
Dub
-1.24
POSITIVE LOGITS
iatus
1.87
cale
1.66
than
1.63
]}
1.40
urations
1.39
�
1.33
ppo
1.29
toler
1.27
itus
1.27
stable
1.27
Activations Density 0.006%