INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     '`
    -0.08
     ranger
    -0.08
     caminho
    -0.08
     '/
    -0.08
    Gab
    -0.07
     caminhos
    -0.07
    [level
    -0.07
    afs
    -0.07
    tee
    -0.07
    Snapshots
    -0.07
    POSITIVE LOGITS
     formular
    0.09
     friction
    0.09
    0.08
     frivol
    0.08
    0.08
    污染
    0.08
     pady
    0.08
    0.08
     التهاب
    0.08
     French
    0.08
    Act Density 0.002%

    No Known Activations