INDEX
    Explanations

    references to films and their characteristics

    New Auto-Interp
    Negative Logits
     ráp
    -0.22
     italiana
    -0.21
     pública
    -0.20
    /he
    -0.20
     polÃŃtica
    -0.19
     herself
    -0.19
     gratuita
    -0.19
    ordova
    -0.18
     اÙĦØ£ÙħرÙĬÙĥÙĬØ©
    -0.18
     mesma
    -0.17
    POSITIVE LOGITS
     himself
    0.25
     stesso
    0.21
     ÙĨÙ쨳Ùĩ
    0.19
     اÙĦعربÙĬ
    0.19
     اÙĦذÙĬ
    0.18
     uveden
    0.16
     koji
    0.16
     abi
    0.16
    /she
    0.15
     plank
    0.15
    Act Density 0.314%

    No Known Activations