INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     72
    -0.87
     turb
    -0.86
     Tob
    -0.82
    hal
    -0.80
    72
    -0.78
     Sabb
    -0.78
     Ub
    -0.77
     Typh
    -0.77
    272
    -0.77
     Tara
    -0.76
    POSITIVE LOGITS
    ie
    1.63
    IE
    1.36
    nie
    1.20
    zie
    1.20
    kie
    1.17
    iew
    1.16
    innie
    1.12
    orie
    1.11
    mie
    1.11
    ies
    1.11
    Act Density 0.241%

    No Known Activations