INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    alph
    -0.08
    Passengers
    -0.08
    movies
    -0.08
    outside
    -0.08
     resulted
    -0.07
     movies
    -0.07
     beh
    -0.07
     avy
    -0.07
     microw
    -0.07
    starting
    -0.07
    POSITIVE LOGITS
     Width
    0.10
     dokter
    0.09
     الأرب
    0.08
     Heart
    0.08
    _WIDTH
    0.08
     caafima
    0.08
     الخلف
    0.08
    _width
    0.08
     apẹrẹ
    0.08
     Hearing
    0.08
    Act Density 0.000%

    No Known Activations