INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bots
    -0.81
     Pengu
    -0.78
    ela
    -0.78
    mosp
    -0.78
    mb
    -0.74
     Âł Âł Âł Âł
    -0.74
     Pascal
    -0.73
     BET
    -0.73
    brance
    -0.72
    ibo
    -0.72
    POSITIVE LOGITS
    12
    1.46
    13
    1.18
     12
    1.17
    11
    1.12
     XII
    1.06
     11
    0.96
     13
    0.95
    14
    0.94
    ł
    0.91
     twelve
    0.88
    Act Density 0.071%

    No Known Activations