INDEX
    Explanations

    statements reflecting opinions or evaluations about entities or situations

    New Auto-Interp
    Negative Logits
     betweenstory
    -0.82
    hyrchwyd
    -0.76
     pleaſure
    -0.74
     oprot
    -0.68
     houſe
    -0.68
     ſmall
    -0.68
     ſever
    -0.67
     Majefty
    -0.66
     occaf
    -0.65
     Shakspeare
    -0.65
    POSITIVE LOGITS
    IRQn
    0.58
    WriteBarrier
    0.55
    QUI
    0.54
     fast
    0.51
    cely
    0.48
    되지
    0.48
     كومونز
    0.48
     szóci
    0.48
     sé
    0.48
    angan
    0.47
    Act Density 0.372%

    No Known Activations