INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ولا
    -0.07
    aked
    -0.06
     оформ
    -0.06
    imator
    -0.06
    вер
    -0.06
    ber
    -0.06
     ALTER
    -0.06
     nationalist
    -0.06
     Arg
    -0.06
    ’ya
    -0.06
    POSITIVE LOGITS
     consolidate
    0.07
    weights
    0.07
    Classification
    0.07
    (Encoding
    0.07
     offline
    0.07
     extras
    0.06
    <File
    0.06
     Din
    0.06
     نمایش
    0.06
    _aut
    0.06
    Act Density 0.029%

    No Known Activations