INDEX
    Explanations

    terms and concepts associated with entertainment and media reviews

    New Auto-Interp
    Negative Logits
     ostavi
    -0.92
    enderror
    -0.91
     للاسماء
    -0.86
     ſever
    -0.85
     pleaſure
    -0.83
     Reſ
    -0.83
     ſche
    -0.82
     дописавши
    -0.82
     Majefty
    -0.81
     ſtate
    -0.80
    POSITIVE LOGITS
    ,
    1.08
     but
    0.94
     however
    0.82
     and
    0.81
     in
    0.69
    ますが
    0.66
     as
    0.65
     though
    0.64
     because
    0.64
     with
    0.63
    Act Density 1.644%

    No Known Activations