INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     farms
    -0.07
     Pill
    -0.07
     ам
    -0.07
    -0.07
     علیه
    -0.06
     Pair
    -0.06
     prostřednictvím
    -0.06
    =dict
    -0.06
    competitive
    -0.06
    {↵↵↵
    -0.06
    POSITIVE LOGITS
     Laud
    0.07
    -trigger
    0.06
     gim
    0.06
    arms
    0.06
     blí
    0.06
    0.06
    ัณฑ
    0.06
    ламент
    0.06
    HeaderView
    0.06
    (book
    0.06
    Act Density 0.139%

    No Known Activations