INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ورز
    -0.07
    Army
    -0.07
    Station
    -0.07
    million
    -0.07
    獲得
    -0.07
     науки
    -0.07
     Vườn
    -0.07
    _WORK
    -0.06
     malaria
    -0.06
     crashes
    -0.06
    POSITIVE LOGITS
    .sap
    0.07
    0.07
    ahrenheit
    0.06
     ukaz
    0.06
     файл
    0.06
         ↵↵
    0.06
     IndexPath
    0.06
     خط
    0.06
    [*
    0.06
     رج
    0.06
    Act Density 0.029%

    No Known Activations