INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     virtue
    -0.09
    -Nov
    -0.09
     বন
    -0.08
     पल
    -0.08
     vertu
    -0.08
     Lon
    -0.08
     sustain
    -0.08
     nate
    -0.08
     stink
    -0.08
     ווע
    -0.08
    POSITIVE LOGITS
    Output
    0.08
    Data
    0.08
    addon
    0.08
    Addon
    0.08
     terra
    0.07
    Emb
    0.07
    ذ
    0.07
    IAM
    0.07
    Add
    0.07
    Click
    0.07
    Act Density 0.002%

    No Known Activations