INDEX
Explanations
references to rabbits and related diseases
New Auto-Interp
Negative Logits
anske
-0.16
atro
-0.16
wnd
-0.16
fish
-0.15
zk
-0.15
thane
-0.15
nicos
-0.15
ấu
-0.15
dog
-0.14
gars
-0.14
POSITIVE LOGITS
rabbit
0.51
rabbits
0.48
Rabbit
0.46
rabbit
0.46
Rab
0.43
bunny
0.41
Bunny
0.38
.rabbit
0.36
rab
0.32
hare
0.28
Activations Density 0.012%