INDEX
Explanations
phrases that express specific preferences or opinions
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.10
3:0.06
4:0.06
5:0.03
6:0.06
7:0.39
8:0.04
9:0.04
10:0.08
11:0.06
Negative Logits
Valhalla
-1.79
Suite
-1.69
inav
-1.67
Symbol
-1.63
unification
-1.57
Walls
-1.56
Unity
-1.53
Neuroscience
-1.50
Statue
-1.49
srfAttach
-1.49
POSITIVE LOGITS
cull
1.83
emouth
1.80
lett
1.79
offend
1.73
ppe
1.67
vying
1.66
enqu
1.59
pests
1.56
assail
1.55
lest
1.53
Activations Density 0.000%