INDEX
Explanations
This neuron is looking for specific terms related to stories or events in science fiction or fantasy contexts
words related to the concept of sabotage
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.23
3:0.06
4:0.24
5:0.05
6:0.03
7:0.04
8:0.05
9:0.09
10:0.05
11:0.02
Negative Logits
bage
-1.55
wikipedia
-1.49
nces
-1.46
steen
-1.42
miah
-1.34
ndra
-1.33
aceae
-1.28
MAP
-1.27
untarily
-1.26
lishes
-1.25
POSITIVE LOGITS
WARE
1.37
bells
1.36
pains
1.20
ooth
1.18
labor
1.12
ratt
1.06
oman
1.06
ioxide
1.05
Labor
1.05
deb
1.04
Activations Density 0.001%