INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    orthand
    -0.07
     whistle
    -0.06
     histo
    -0.06
    elop
    -0.06
    ело
    -0.06
    rails
    -0.06
     губ
    -0.06
     Pawn
    -0.06
     ngang
    -0.06
    lover
    -0.06
    POSITIVE LOGITS
     SDK
    0.11
    DK
    0.10
     sdk
    0.08
    .sdk
    0.08
    sdk
    0.08
     inc
    0.07
    pk
    0.07
    =sum
    0.07
    dk
    0.07
    SDK
    0.07
    Act Density 0.003%

    No Known Activations