INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     전세
    -0.06
     ارز
    -0.06
    run
    -0.06
     instability
    -0.06
     bv
    -0.06
     budget
    -0.06
     dziewcz
    -0.05
    CEE
    -0.05
     equity
    -0.05
    _fecha
    -0.05
    POSITIVE LOGITS
     words
    0.08
    Fel
    0.07
     _{
    0.07
    0.07
     необхід
    0.06
     Perception
    0.06
     relu
    0.06
    hours
    0.06
     Declarations
    0.06
     Words
    0.06
    Act Density 0.024%

    No Known Activations