INDEX
Explanations
expressions of discomfort or unease
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.09
3:0.08
4:0.08
5:0.08
6:0.06
7:0.08
8:0.09
9:0.08
10:0.07
11:0.08
Negative Logits
adapting
-2.11
conserve
-2.10
jurisd
-2.05
technically
-2.04
izoph
-2.02
lest
-2.02
contend
-1.99
exagger
-1.98
stren
-1.98
asserting
-1.97
POSITIVE LOGITS
netflix
2.59
raq
2.15
ucer
2.08
els
2.07
{\2.02
rice
1.98
ho
1.97
urtle
1.96
yahoo
1.96
ghost
1.96
Activations Density 0.000%