INDEX
Explanations
phrases related to health and sickness
New Auto-Interp
Head Attr Weights
0:0.06
1:0.04
2:0.13
3:0.04
4:0.04
5:0.05
6:0.23
7:0.04
8:0.03
9:0.24
10:0.02
11:0.02
Negative Logits
Zhou
-3.61
ulia
-3.57
imble
-3.52
ragon
-3.36
arov
-3.32
ixel
-3.32
VB
-3.30
Decoder
-3.27
iframe
-3.26
Zimmerman
-3.23
POSITIVE LOGITS
sick
7.88
Sick
7.82
sickness
7.67
illness
5.92
illnesses
5.66
health
4.22
Health
4.13
coughing
4.11
cured
4.06
health
4.02
Activations Density 0.003%