INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    partment
    -0.07
    Arg
    -0.07
     holy
    -0.06
    -0.06
    _coordinates
    -0.06
    Instances
    -0.06
    ومی
    -0.06
     виник
    -0.06
    なかった
    -0.06
     Where
    -0.06
    POSITIVE LOGITS
     seller
    0.07
    dress
    0.06
     distractions
    0.06
     lure
    0.06
     голос
    0.06
     interruptions
    0.06
    (boost
    0.06
    roje
    0.06
     distributes
    0.06
     werk
    0.06
    Act Density 0.009%

    No Known Activations