INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Scheme
    -0.06
    ence
    -0.06
    ().↵
    -0.05
    urrency
    -0.05
    ;\↵
    -0.05
    sko
    -0.05
    isc
    -0.05
     Reed
    -0.05
     salah
    -0.05
    Fact
    -0.05
    POSITIVE LOGITS
     알아
    0.07
     connects
    0.07
    -xs
    0.07
    0.07
     '-',
    0.07
    isory
    0.07
     native
    0.06
    -transparent
    0.06
     норм
    0.06
    oref
    0.06
    Act Density 0.004%

    No Known Activations