INDEX
Explanations
references to the health and well-being of whales
New Auto-Interp
Negative Logits
nest
-0.19
rabbit
-0.18
rabbits
-0.18
lattice
-0.17
atti
-0.17
udit
-0.15
spiders
-0.15
lizard
-0.15
Nest
-0.15
Tro
-0.14
POSITIVE LOGITS
cet
0.36
whales
0.32
bott
0.31
whale
0.30
pod
0.29
pods
0.29
Killer
0.28
killer
0.28
sperm
0.28
Pods
0.27
Activations Density 0.140%