INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    yang
    -0.07
    agrid
    -0.07
    .table
    -0.07
     detained
    -0.06
    riors
    -0.06
     Ка
    -0.06
    inand
    -0.06
    bay
    -0.06
     matt
    -0.06
    iliation
    -0.06
    POSITIVE LOGITS
    phins
    0.08
    .pattern
    0.06
    Jud
    0.06
     ERROR
    0.06
    ANTITY
    0.06
     süresi
    0.06
    /control
    0.06
     footwear
    0.06
    --
    0.06
    <ID
    0.06
    Act Density 0.002%

    No Known Activations