INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     seamless
    -0.08
     hypotheses
    -0.07
     begun
    -0.07
    igue
    -0.07
     Victory
    -0.07
    afa
    -0.06
     Peace
    -0.06
     Easter
    -0.06
     penalty
    -0.06
     adel
    -0.06
    POSITIVE LOGITS
     drinks
    0.11
     drink
    0.10
     Drinks
    0.10
    drink
    0.10
     Drink
    0.08
    &type
    0.08
    _ENCOD
    0.07
    VarInsn
    0.07
     дан
    0.07
    Drink
    0.07
    Act Density 0.005%

    No Known Activations