INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     omen
    -0.08
     cof
    -0.07
     پا
    -0.07
    .plan
    -0.07
    -0.07
    Ar
    -0.07
    .Ar
    -0.07
     spared
    -0.07
    Early
    -0.07
    .pb
    -0.07
    POSITIVE LOGITS
     irresist
    0.09
     airflow
    0.08
     promos
    0.08
     downright
    0.08
     downfall
    0.07
    rews
    0.07
     MON
    0.07
    cdecl
    0.07
     loci
    0.07
     قي
    0.07
    Act Density 0.010%

    No Known Activations