INDEX
    Explanations

    phrases that express disappointment or critique of movies

    New Auto-Interp
    Negative Logits
     mistr
    -0.16
    owers
    -0.15
    ulary
    -0.14
    olla
    -0.14
    aptic
    -0.14
    illard
    -0.14
    pheric
    -0.13
    inand
    -0.13
     cant
    -0.13
     Canon
    -0.13
    POSITIVE LOGITS
     moderate
    0.21
    лаж
    0.19
     manageable
    0.18
     harmless
    0.18
     worst
    0.17
     merely
    0.17
     worse
    0.17
     relatively
    0.17
     Moderate
    0.17
     mild
    0.16
    Act Density 0.356%

    No Known Activations