INDEX
    Explanations

    percentage numbers and units

    New Auto-Interp
    Negative Logits
    ט
    0.50
    ור
    0.49
    N
    0.49
    ى
    0.46
    0.44
    0.44
    enaar
    0.43
     Temmuz
    0.43
    0.42
    }$=
    0.42
    POSITIVE LOGITS
    is
    0.56
     contra
    0.54
     _
    0.50
     __
    0.49
     fission
    0.49
     ions
    0.48
     in
    0.46
     impi
    0.46
     adversarial
    0.46
     valuables
    0.45
    Act Density 0.204%

    No Known Activations