INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     dwell
    -0.07
     ful
    -0.06
     pocket
    -0.06
    ő
    -0.06
     modal
    -0.06
     sulf
    -0.06
     openness
    -0.06
    ots
    -0.06
    Coding
    -0.06
     captures
    -0.06
    POSITIVE LOGITS
    (className
    0.07
     verwenden
    0.06
     Consequently
    0.06
     حالی
    0.06
    InOut
    0.06
     κυ
    0.06
    ありが
    0.06
    ことで
    0.06
     Làm
    0.06
     Airbus
    0.06
    Act Density 0.006%

    No Known Activations