INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sarebbero
    0.50
    0.43
     mydict
    0.42
     seraient
    0.42
    ’।
    0.42
    eraient
    0.41
     esimerkiksi
    0.41
    0.41
    ’).
    0.40
     avaient
    0.40
    POSITIVE LOGITS
    u
    0.37
    an
    0.36
    N
    0.36
    co
    0.35
    T
    0.34
    the
    0.34
    0.33
    CO
    0.33
    pr
    0.32
    The
    0.32
    Act Density 0.000%

    No Known Activations