INDEX
    Explanations

    references to film titles and awards

    New Auto-Interp
    Negative Logits
     Fashion
    -0.15
    ÑĽ
    -0.15
    esser
    -0.15
     Leban
    -0.14
    ISBN
    -0.14
    porter
    -0.14
    tura
    -0.14
    arem
    -0.13
    iar
    -0.13
    ÅĻ
    -0.13
    POSITIVE LOGITS
     foi
    0.29
     te
    0.26
     era
    0.23
     estava
    0.23
     tinha
    0.21
    itou
    0.19
     recebe
    0.19
     fe
    0.19
     trou
    0.19
     fic
    0.19
    Act Density 0.008%

    No Known Activations