INDEX
Explanations
mentions of reptiles, particularly snakes
references to snakes and snake-related terminology
New Auto-Interp
Negative Logits
estine
-0.80
vre
-0.76
encer
-0.68
eatures
-0.67
Beckham
-0.66
sts
-0.64
Neh
-0.63
ufact
-0.62
propos
-0.62
Universal
-0.62
POSITIVE LOGITS
bite
1.17
snakes
1.07
guards
0.93
snake
0.92
Snake
0.91
venom
0.85
atos
0.84
opus
0.83
oche
0.83
reptiles
0.81
Activations Density 0.018%