INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.87
     a
    -0.56
      
    -0.53
     seven
    -0.52
     w
    -0.52
     five
    -0.50
     nine
    -0.50
     four
    -0.49
     eight
    -0.49
     amp
    -0.49
    POSITIVE LOGITS
     avoient
    0.94
     feroit
    0.90
     étoient
    0.87
     auroit
    0.87
     mauvaises
    0.85
     dépens
    0.85
     lèvres
    0.82
     colorés
    0.81
     démocr
    0.79
     réguli
    0.78
    Act Density 0.062%

    No Known Activations