INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    mir
    -0.08
     champagne
    -0.08
     vantage
    -0.08
     sf
    -0.08
     bop
    -0.08
    lidir
    -0.07
     spruce
    -0.07
     sublim
    -0.07
    Charles
    -0.07
     Rip
    -0.07
    POSITIVE LOGITS
    .ser
    0.08
     terang
    0.08
     Richardson
    0.08
     Killing
    0.08
     humanitarian
    0.08
     boleh
    0.08
    0.08
    -ROM
    0.07
     ويب
    0.07
     Loko
    0.07
    Act Density 0.003%

    No Known Activations