INDEX
    Explanations

    numbers following certain punctuation

    New Auto-Interp
    Negative Logits
    (
    1.03
     be
    0.88
     in
    0.75
    '
    0.75
    I
    0.73
     that
    0.66
     an
    0.63
     le
    0.62
    H
    0.59
    0.58
    POSITIVE LOGITS
    на
    0.96
    ون
    0.83
    0.78
    ке
    0.75
    ان
    0.73
    માં
    0.71
    тэй
    0.71
    да
    0.70
    良い
    0.69
    0.68
    Act Density 1.313%

    No Known Activations