INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    yaw
    -0.14
    ago
    -0.14
    osome
    -0.14
     protests
    -0.14
    acman
    -0.13
    atura
    -0.13
     Gro
    -0.13
    ten
    -0.13
     ire
    -0.13
    abela
    -0.13
    POSITIVE LOGITS
    aliz
    0.17
    904
    0.16
    cola
    0.15
    .Aggressive
    0.15
     ofType
    0.15
    ĭ
    0.15
    esti
    0.14
    меÑĪ
    0.14
    ovit
    0.14
    StateException
    0.14
    Act Density 0.041%

    No Known Activations