INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     himself
    0.86
     yourself
    0.80
    став
    0.73
    ")}}'
    0.69
     itself
    0.69
    ")}}
    0.68
     themselves
    0.68
     oneself
    0.68
    হ্য
    0.66
     attribute
    0.66
    POSITIVE LOGITS
    approx
    0.68
     Earlier
    0.65
     aprox
    0.64
    0.63
    near
    0.63
    ↵↵
    0.63
    कुछ
    0.63
     gần
    0.62
     few
    0.62
    late
    0.62
    Act Density 0.205%

    No Known Activations