INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Remaining
    -0.07
     Rate
    -0.07
     detriment
    -0.06
     vaccination
    -0.06
     Ergebn
    -0.06
    -0.06
    AAAA
    -0.06
    ใช
    -0.06
     totalCount
    -0.06
     Rent
    -0.06
    POSITIVE LOGITS
     willingly
    0.08
     knowingly
    0.08
     фіз
    0.07
    PropertyDescriptor
    0.07
    shed
    0.07
     intentional
    0.07
     deliberately
    0.07
    çak
    0.06
     Saudi
    0.06
     blatant
    0.06
    Act Density 0.004%

    No Known Activations