INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Jefus
    -1.14
    Either
    -1.09
    either
    -1.05
     either
    -1.05
     pleaſure
    -0.96
     himſelf
    -0.95
     themſelves
    -0.95
     ſta
    -0.94
     itſelf
    -0.94
     ſche
    -0.93
    POSITIVE LOGITS
    AndEndTag
    0.74
     resourceCulture
    0.67
     off
    0.65
     '{@
    0.63
     «
    0.63
    <eos>
    0.60
    awtextra
    0.59
     فريبيس
    0.58
     propOrder
    0.58
     Off
    0.58
    Act Density 0.190%

    No Known Activations