INDEX
Explanations
references to rabbits
references to rabbits and related imagery
New Auto-Interp
Negative Logits
omething
-0.90
igmat
-0.79
ician
-0.76
inia
-0.76
inen
-0.74
rylic
-0.74
ructure
-0.73
orie
-0.70
itutional
-0.69
eatures
-0.69
POSITIVE LOGITS
MQ
1.07
rabbit
0.94
Rabbit
0.94
Hole
0.91
meat
0.88
rabbits
0.83
fish
0.80
Nest
0.79
Hunt
0.78
gey
0.75
Activations Density 0.013%