INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     khiến
    0.47
     współpr
    0.44
    允許
    0.43
     omoguć
    0.42
    veranst
    0.42
    válto
    0.41
    ച്ച്
    0.41
    د
    0.41
    potentially
    0.40
    ্কা
    0.40
    POSITIVE LOGITS
     by
    0.57
     who
    0.53
     in
    0.47
    (
    0.44
    3
    0.42
    에게
    0.42
    )}
    0.41
    ),
    0.40
     btw
    0.40
    B
    0.39
    Act Density 0.065%

    No Known Activations