INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    PA
    -0.09
     hypertension
    -0.08
     ROS
    -0.08
     ozone
    -0.08
     PA
    -0.08
     OPEN
    -0.08
     blauwe
    -0.08
    indows
    -0.07
    OPEN
    -0.07
     وڏي
    -0.07
    POSITIVE LOGITS
     catchy
    0.09
    0.09
    0.08
     художе
    0.08
     ringan
    0.08
     atrap
    0.08
    enin
    0.08
     entret
    0.08
     rendre
    0.08
     lay
    0.08
    Act Density 0.003%

    No Known Activations