INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ('*
    -0.07
     Crack
    -0.07
     ignorant
    -0.06
     NSStringFromClass
    -0.06
     shocked
    -0.06
    .IsNullOr
    -0.06
     '!
    -0.06
     '%
    -0.06
     bych
    -0.06
     Winston
    -0.05
    POSITIVE LOGITS
    2
    0.24
     two
    0.14
    ۲
    0.13
     Two
    0.13
    4
    0.13
    3
    0.11
    0.11
    Two
    0.11
    0.10
    two
    0.10
    Act Density 0.603%

    No Known Activations