INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ant
    0.87
    is
    0.79
    ۔
    0.78
    which
    0.77
    up
    0.71
    years
    0.70
    il
    0.69
    diamonds
    0.68
    when
    0.68
    ک
    0.66
    POSITIVE LOGITS
     whale
    1.17
     whales
    1.10
    0.93
     Whale
    0.91
    🐋
    0.82
     by
    0.82
    0.77
    🐳
    0.74
     regroup
    0.71
     com
    0.70
    Act Density 0.010%

    No Known Activations