INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ViewFeatures
    -0.54
    RegressionTest
    -0.51
    pirazione
    -0.49
    ItemBackground
    -0.48
    yatı
    -0.47
    Sortie
    -0.46
    Ethnicity
    -0.46
    eventbus
    -0.46
    pagnol
    -0.45
    WHM
    -0.45
    POSITIVE LOGITS
    ns
    0.54
    dy
    0.53
    dog
    0.52
    hand
    0.52
    man
    0.51
     cães
    0.51
    GEBURTSDATUM
    0.50
     cão
    0.49
    ny
    0.49
     enforce
    0.49
    Act Density 0.002%

    No Known Activations