INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     recreational
    -0.08
     moulin
    -0.07
    _ing
    -0.07
     socks
    -0.07
    Agents
    -0.07
     posing
    -0.07
     Playground
    -0.07
    _vectors
    -0.07
    .ing
    -0.07
    urtles
    -0.07
    POSITIVE LOGITS
     criterio
    0.11
    criterion
    0.10
    Criterion
    0.10
     Penal
    0.10
     criterion
    0.10
     penal
    0.09
    尺度
    0.09
     kriter
    0.09
     criterios
    0.09
     criteria
    0.09
    Act Density 0.002%

    No Known Activations