INDEX
Explanations
phrases indicating readiness or willingness to take action
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.08
3:0.07
4:0.09
5:0.02
6:0.09
7:0.35
8:0.02
9:0.02
10:0.08
11:0.11
Negative Logits
acebook
-1.61
itures
-1.46
Tycoon
-1.43
uthor
-1.31
umblr
-1.30
inity
-1.29
Influence
-1.28
vernment
-1.28
avery
-1.26
Bought
-1.26
POSITIVE LOGITS
unle
1.56
prepared
1.56
hardened
1.55
upt
1.51
ready
1.51
ppo
1.46
conting
1.44
seasoned
1.39
preparations
1.34
bursting
1.33
Activations Density 0.013%