INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     dni
    -0.07
     Colour
    -0.07
     followers
    -0.07
     emphasize
    -0.06
     crimes
    -0.06
    Level
    -0.06
    kes
    -0.06
     Belfast
    -0.06
     clears
    -0.06
    めて
    -0.06
    POSITIVE LOGITS
    .quality
    0.07
    0.07
     eser
    0.06
    0.06
    iflower
    0.06
     NRA
    0.06
    PushMatrix
    0.06
    μένοι
    0.06
     cuối
    0.06
     कव
    0.06
    Act Density 0.004%

    No Known Activations