INDEX
    Explanations

    thoroughbred

    New Auto-Interp
    Negative Logits
    адж
    -0.07
    -0.07
    dz
    -0.06
    807
    -0.06
    ilo
    -0.06
    esin
    -0.06
     Roosevelt
    -0.06
    (pl
    -0.06
     кан
    -0.06
    هور
    -0.06
    POSITIVE LOGITS
     gerçek
    0.07
     heartbeat
    0.07
     табли
    0.06
     compressed
    0.06
     соверш
    0.06
     flea
    0.06
     chờ
    0.06
     благодаря
    0.06
    ğü
    0.06
     comprehensive
    0.06
    Act Density 0.001%

    No Known Activations