INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ppo
    -0.17
    343
    -0.16
    egral
    -0.15
     Dro
    -0.15
    terminal
    -0.15
    olini
    -0.15
     Cer
    -0.15
     terminal
    -0.15
    éra
    -0.14
    926
    -0.14
    POSITIVE LOGITS
     basis
    0.25
    basis
    0.22
     behalf
    0.20
     occasions
    0.20
     Basis
    0.19
    ires
    0.17
     Rosen
    0.16
    ingleton
    0.16
    _basis
    0.15
    assis
    0.15
    Act Density 0.028%

    No Known Activations