INDEX
    Explanations

    skeptical or skepticism

    New Auto-Interp
    Negative Logits
    ظر
    0.91
    ‌ی
    0.88
    0.86
    였다
    0.83
    なか
    0.82
    0.80
    om
    0.79
     starfish
    0.78
    なが
    0.77
    री
    0.77
    POSITIVE LOGITS
    к
    1.07
    an
    0.97
    0
    0.95
    (
    0.91
    '
    0.89
    a
    0.89
    that
    0.88
    K
    0.84
    0.84
    in
    0.82
    Act Density 0.002%

    No Known Activations