INDEX
Explanations
phrases related to detection and sensory experiences
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.06
3:0.21
4:0.02
5:0.03
6:0.06
7:0.09
8:0.07
9:0.15
10:0.07
11:0.14
Negative Logits
arsity
-1.18
enment
-1.15
装
-1.14
hement
-1.12
ヘ
-1.11
)=(
-1.10
Cube
-1.06
GBT
-1.06
��
-1.05
ailand
-1.04
POSITIVE LOGITS
mouth
1.22
toxins
1.10
noses
1.09
gaping
1.07
pulse
1.06
rows
1.05
neurot
1.05
sniff
1.04
mustard
1.03
juices
1.01
Activations Density 0.002%