INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     firm's
    -0.08
    -0.08
     surcharge
    -0.08
    शी
    -0.08
     overseas
    -0.07
     breakup
    -0.07
     family's
    -0.07
     कंट
    -0.07
     घट
    -0.07
     anger
    -0.07
    POSITIVE LOGITS
    roles
    0.12
     roles
    0.12
     symmetry
    0.11
     symmetric
    0.10
     Rollen
    0.10
     symmetrical
    0.10
    _roles
    0.09
     Roles
    0.09
     swapped
    0.09
    Roles
    0.09
    Act Density 0.021%

    No Known Activations