INDEX
    Explanations

    references to whales

    references to whales, specifically distinguishing between different types and contexts involving whales

    New Auto-Interp
    Negative Logits
    ãĥ´ãĤ¡
    -0.87
    Interstitial
    -0.86
    senal
    -0.81
    uers
    -0.75
    yrinth
    -0.74
    ggles
    -0.68
    encing
    -0.67
    DM
    -0.64
    mble
    -0.64
    PT
    -0.64
    POSITIVE LOGITS
     whale
    1.33
     whales
    1.28
    odon
    1.06
     Whale
    1.02
     sharks
    1.01
     shark
    1.00
     dolphins
    0.96
    fish
    0.92
     carc
    0.91
     dolphin
    0.86
    Act Density 0.015%

    No Known Activations