INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     RIPRODUZIONE
    -0.86
    s
    -0.84
    ThroughAttribute
    -0.77
    shire
    -0.76
    n
    -0.75
     zelve
    -0.74
    ی
    -0.73
     ainfi
    -0.73
    surate
    -0.73
    liness
    -0.72
    POSITIVE LOGITS
    -]
    0.48
     te
    0.48
     u
    0.46
     cat
    0.46
     way
    0.45
    tover
    0.45
     turn
    0.45
     tea
    0.44
     ho
    0.44
     sche
    0.43
    Act Density 0.219%

    No Known Activations