INDEX
    Explanations

    mentions of specific locations

    New Auto-Interp
    Negative Logits
     sappi
    -1.16
     parteci
    -0.89
     succede
    -0.83
     apparti
    -0.80
     ridu
    -0.80
     vuol
    -0.79
    bbene
    -0.79
     migli
    -0.79
     inol
    -0.77
     altrett
    -0.77
    POSITIVE LOGITS
    <bos>
    0.95
     there
    0.90
     we
    0.75
    ,
    0.71
     they
    0.69
     you
    0.68
     alone
    0.64
     it
    0.64
    there
    0.63
     There
    0.55
    Act Density 0.585%

    No Known Activations