INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     여행
    -0.07
     desperation
    -0.07
     riding
    -0.07
     tasarım
    -0.07
     danger
    -0.07
     onları
    -0.07
     پاسخ
    -0.06
     vyjád
    -0.06
    _damage
    -0.06
    ikat
    -0.06
    POSITIVE LOGITS
     abolished
    0.14
     abolish
    0.13
     abol
    0.12
     abolition
    0.10
    obsolete
    0.07
     banned
    0.06
     Club
    0.06
     Anth
    0.06
     scrapped
    0.06
     abi
    0.06
    Act Density 0.003%

    No Known Activations