INDEX
    Explanations

    references to various countries, with a specific focus on Italy

    New Auto-Interp
    Negative Logits
    <bos>
    -2.57
    Allora
    -0.81
    -0.72
    Quindi
    -0.63
    Rispondi
    -0.62
     prevent
    -0.61
     since
    -0.61
    Tutto
    -0.61
     maximize
    -0.60
    Inoltre
    -0.60
    POSITIVE LOGITS
     aen
    1.72
     ftu
    1.60
     lele
    1.57
     fta
    1.55
     thut
    1.54
     jaya
    1.50
     tew
    1.48
     myn
    1.48
     vns
    1.48
     mef
    1.47
    Act Density 0.169%

    No Known Activations