INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     agitation
    -0.06
    hx
    -0.06
     invasion
    -0.06
     його
    -0.06
     horn
    -0.06
     judge
    -0.06
     permits
    -0.06
     squat
    -0.06
    Brief
    -0.06
    Accordion
    -0.06
    POSITIVE LOGITS
    0.07
    0.07
    abant
    0.06
    0.06
     نمی
    0.06
    0.06
    ंटर
    0.06
    0.06
    .S
    0.06
     +:+
    0.06
    Act Density 0.012%

    No Known Activations