INDEX
Explanations
phrases expressing comfort and discomfort in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.00
2:0.07
3:0.05
4:0.13
5:0.03
6:0.03
7:0.45
8:0.01
9:0.03
10:0.07
11:0.06
Negative Logits
ennial
-1.64
acio
-1.60
aceutical
-1.55
lass
-1.53
phabet
-1.51
govtrack
-1.50
ials
-1.47
idal
-1.46
ritic
-1.45
ocus
-1.43
POSITIVE LOGITS
admitting
1.78
knowing
1.67
comfortable
1.55
Cub
1.51
sharing
1.47
snug
1.45
trusting
1.42
admit
1.39
comfortably
1.38
improv
1.38
Activations Density 0.008%