INDEX
    Explanations

    titles of films, books, and articles

    New Auto-Interp
    Negative Logits
     xxiii
    -0.56
     xxv
    -0.56
    Bedankt
    -0.56
    Veel
    -0.56
     xxvi
    -0.55
    Jakie
    -0.55
     regardant
    -0.54
    Prí
    -0.53
    Pře
    -0.53
     whofe
    -0.53
    POSITIVE LOGITS
     minimalis
    0.82
     palab
    0.77
     utop
    0.73
     gmbh
    0.72
     demen
    0.71
     kuns
    0.70
     abnorm
    0.70
     lapto
    0.70
     verba
    0.69
     pietre
    0.69
    Act Density 0.318%

    No Known Activations