INDEX
Explanations
references to whales
references to whales, specifically distinguishing between different types and contexts involving whales
New Auto-Interp
Negative Logits
ãĥ´ãĤ¡
-0.87
Interstitial
-0.86
senal
-0.81
uers
-0.75
yrinth
-0.74
ggles
-0.68
encing
-0.67
DM
-0.64
mble
-0.64
PT
-0.64
POSITIVE LOGITS
whale
1.33
whales
1.28
odon
1.06
Whale
1.02
sharks
1.01
shark
1.00
dolphins
0.96
fish
0.92
carc
0.91
dolphin
0.86
Activations Density 0.015%