INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Naz
    -0.07
    	Write
    -0.07
     گی
    -0.07
    Targets
    -0.06
     campaign
    -0.06
    Escape
    -0.06
     wouldn
    -0.06
    ่งข
    -0.06
     beberapa
    -0.06
    !!}↵
    -0.06
    POSITIVE LOGITS
     carpet
    0.06
    inke
    0.06
     Goose
    0.06
     кожного
    0.06
    silent
    0.06
    /device
    0.06
    -base
    0.06
    олева
    0.06
    基地
    0.06
    .orange
    0.06
    Act Density 0.016%

    No Known Activations