INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Cho
    -0.96
     cho
    -0.73
     Chatt
    -0.73
     Bott
    -0.71
     Barg
    -0.71
     RELEASE
    -0.69
     PT
    -0.69
     TW
    -0.69
     CHO
    -0.68
     Prest
    -0.68
    POSITIVE LOGITS
    ia
    1.69
    ian
    1.64
    ians
    1.47
    IA
    1.29
    ias
    1.27
    ial
    1.27
    iae
    1.26
    ios
    1.25
    ÃŃa
    1.24
    iah
    1.24
    Act Density 0.119%

    No Known Activations