INDEX
    Explanations

    articles and prepositions

    New Auto-Interp
    Negative Logits
    0.59
    0.54
    s
    0.53
    l
    0.52
    a
    0.51
    n
    0.48
    unta
    0.47
    });
    0.44
    azioni
    0.43
     toDo
    0.43
    POSITIVE LOGITS
     the
    0.81
     a
    0.67
    ppled
    0.66
     have
    0.63
    The
    0.63
    the
    0.62
    A
    0.62
    ش
    0.59
     an
    0.59
    0.59
    Act Density 0.269%

    No Known Activations