INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    urlpatterns
    -0.09
     weder
    -0.08
     ikibazo
    -0.08
     egal
    -0.08
     apont
    -0.08
     Díaz
    -0.07
     sekitar
    -0.07
     amak
    -0.07
     खर
    -0.07
     wszystkim
    -0.07
    POSITIVE LOGITS
     significance
    0.08
    erven
    0.08
     interpreting
    0.08
    Notes
    0.07
     rationale
    0.07
    _description
    0.07
    Reasons
    0.07
     Bedeutung
    0.07
     explanation
    0.07
    Interpret
    0.07
    Act Density 0.032%

    No Known Activations