INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     variabile
    0.82
     descarga
    0.73
    erebbe
    0.73
     acciaio
    0.72
    0.71
     eczema
    0.71
     govor
    0.70
     regarde
    0.69
    aspetto
    0.69
     асо
    0.69
    POSITIVE LOGITS
     to
    1.01
    ↵↵
    0.94
    to
    0.93
    6
    0.92
    LA
    0.86
    3
    0.82
    DA
    0.80
    ä
    0.80
     of
    0.77
    7
    0.76
    Act Density 0.004%

    No Known Activations