INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bloss
    -0.07
     bisher
    -0.06
     partitions
    -0.06
     endl
    -0.06
     Ör
    -0.06
     başlan
    -0.06
     техніч
    -0.06
    詳細
    -0.06
     Bloss
    -0.06
    -0.06
    POSITIVE LOGITS
     racist
    0.14
     racism
    0.12
    CM
    0.07
     Across
    0.07
     overriding
    0.07
    ING
    0.07
     coke
    0.07
    .balance
    0.06
     Crunch
    0.06
     سین
    0.06
    Act Density 0.004%

    No Known Activations