INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -2.06
     serve
    -0.78
     assume
    -0.78
    //
    -0.77
     interact
    -0.77
     provide
    -0.76
     continue
    -0.75
     establish
    -0.75
     put
    -0.75
    protected
    -0.75
    POSITIVE LOGITS
     Juf
    2.49
     affor
    2.42
     impra
    2.29
     sovere
    2.27
     accla
    2.26
     maneu
    2.25
     depic
    2.24
     Intere
    2.23
     emphat
    2.21
     squa
    2.20
    Act Density 1.088%

    No Known Activations