INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    jan
    -0.07
     uppercase
    -0.07
     zoals
    -0.07
     بل
    -0.06
     beige
    -0.06
     Cyrus
    -0.06
    pii
    -0.06
     Flower
    -0.06
     nước
    -0.06
     Aurora
    -0.06
    POSITIVE LOGITS
    .Encoding
    0.07
    .Pointer
    0.06
    oons
    0.06
    _import
    0.06
    0.06
     Appeals
    0.06
     kişilerin
    0.06
     prefers
    0.06
    .Groups
    0.06
     свид
    0.06
    Act Density 0.331%

    No Known Activations