INDEX
    Explanations

    terms related to scaling

    New Auto-Interp
    Negative Logits
    **/
    
    -0.67
    RetentionPolicy
    -0.61
    AndEndTag
    -0.59
    __*/
    -0.58
    ifolium
    -0.56
    ICIENCY
    -0.56
    '][$
    -0.55
    Gemeinden
    -0.55
    wój
    -0.54
    🏽
    -0.54
    POSITIVE LOGITS
     Scales
    1.38
     scales
    1.37
    Scales
    1.30
     SCALE
    1.27
     Scale
    1.22
    scales
    1.20
    Scale
    1.15
    SCALE
    1.14
     scale
    1.09
     Scal
    1.06
    Act Density 0.080%

    No Known Activations