INDEX
Explanations
phrases that express desires or intentions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.08
3:0.10
4:0.07
5:0.03
6:0.06
7:0.17
8:0.06
9:0.04
10:0.09
11:0.20
Negative Logits
TNT
-1.53
Socialist
-1.46
�
-1.43
�
-1.42
channelAvailability
-1.39
Wonderful
-1.34
SON
-1.32
Unch
-1.31
Prelude
-1.30
endorsed
-1.29
POSITIVE LOGITS
vati
1.73
treatment
1.68
ridor
1.61
orgetown
1.53
retali
1.51
outine
1.48
erase
1.45
ccoli
1.43
pta
1.40
gil
1.39
Activations Density 0.021%