INDEX
Explanations
references to the word "bat" and its variations
New Auto-Interp
Negative Logits
hoff
-0.21
orgh
-0.16
parator
-0.15
oran
-0.15
adoo
-0.15
.nih
-0.15
Vit
-0.15
nees
-0.15
Operators
-0.15
idge
-0.15
POSITIVE LOGITS
htub
0.26
Bat
0.23
Bat
0.21
wing
0.21
eman
0.20
tement
0.20
avia
0.20
bat
0.19
lle
0.19
ista
0.18
Activations Density 0.010%