INDEX
Explanations
references to rabbits and bunny-related terms
instances of the words "rabbit" and "bunny."
New Auto-Interp
Negative Logits
omething
-0.93
ician
-0.85
igmat
-0.84
inia
-0.78
rylic
-0.78
orie
-0.77
itutional
-0.76
ructure
-0.75
xit
-0.73
inen
-0.73
POSITIVE LOGITS
MQ
1.18
Hole
0.96
rabbit
0.88
meat
0.86
Rabbit
0.85
Wilde
0.82
Nest
0.79
rabbits
0.76
Hunt
0.72
wald
0.69
Activations Density 0.021%