INDEX
    Explanations

    categories denoted by letters

    New Auto-Interp
    Negative Logits
     Save
    -0.07
     موارد
    -0.07
    Eth
    -0.07
     astronauts
    -0.07
    Seq
    -0.06
    _ix
    -0.06
     palate
    -0.06
    /r
    -0.06
    _rooms
    -0.06
     teşkil
    -0.06
    POSITIVE LOGITS
    شنبه
    0.06
    AGENT
    0.06
    ,如
    0.06
     трансп
    0.06
    tees
    0.06
     técn
    0.06
    .:.:.
    0.06
    0.06
    WISE
    0.06
     fuer
    0.06
    Act Density 0.036%

    No Known Activations