INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     languages
    -0.07
     mountains
    -0.07
     True
    -0.07
     doubts
    -0.07
     Moses
    -0.07
     margins
    -0.07
    Pixels
    -0.07
     Rust
    -0.06
     escapes
    -0.06
     extinct
    -0.06
    POSITIVE LOGITS
    NoArgsConstructor
    0.08
     Product
    0.07
    promotion
    0.07
     desirable
    0.07
    adora
    0.07
    province
    0.07
     بهره
    0.07
     mejorar
    0.07
     Georgian
    0.06
    ',)↵
    0.06
    Act Density 0.006%

    No Known Activations