INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    DockStyle
    -0.72
    ConstraintMaker
    -0.65
     AssemblyTitle
    -0.64
     للمعارف
    -0.62
    AddTagHelper
    -0.61
    TemporalType
    -0.61
    Autoritní
    -0.60
    SBATCH
    -0.60
    RectangleBorder
    -0.60
    Hentet
    -0.59
    POSITIVE LOGITS
     incorrect
    1.64
    Incorrect
    1.38
    incorrect
    1.38
     Incorrect
    1.38
     incorrectly
    1.30
     wrong
    1.02
     inaccurate
    0.96
     erroneous
    0.85
     falsche
    0.84
     Wrong
    0.84
    Act Density 0.003%

    No Known Activations