INDEX
    Explanations

    instances of the word "shark"

    New Auto-Interp
    Negative Logits
    ISTER
    -0.82
    ories
    -0.76
     Hemp
    -0.72
    ISION
    -0.67
    haar
    -0.66
    ijah
    -0.64
    ndra
    -0.64
    onsense
    -0.64
    mble
    -0.63
    icably
    -0.63
    POSITIVE LOGITS
    ulic
    0.96
     fins
    0.88
    fish
    0.84
    vati
    0.82
    izont
    0.80
     sharks
    0.80
    mong
    0.80
    iform
    0.78
    bite
    0.77
     Sharks
    0.77
    Act Density 0.018%

    No Known Activations