INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nam
    -0.07
    -0.07
     nam
    -0.07
     Dominic
    -0.06
     כזה
    -0.06
    비스
    -0.06
    .instagram
    -0.06
    .backend
    -0.06
    .people
    -0.06
     massacre
    -0.06
    POSITIVE LOGITS
    -row
    0.07
    0.07
     prostitut
    0.07
    れている
    0.07
    .EntityFrameworkCore
    0.07
    -block
    0.07
    iotic
    0.07
     cartridges
    0.07
    _UTF
    0.07
     Nurs
    0.06
    Act Density 0.009%

    No Known Activations