INDEX
    Explanations

    terms related to scaling or measurement

    New Auto-Interp
    Negative Logits
    **/
    
    -0.79
     Pute
    -0.73
    iotti
    -0.70
     Чита
    -0.70
    gibt
    -0.69
    phite
    -0.69
    ,:);
    -0.68
    🏻
    -0.67
     Dempsey
    -0.66
    %%
    
    -0.66
    POSITIVE LOGITS
     scales
    1.60
     Scales
    1.55
     SCALE
    1.51
    Scales
    1.47
     Scale
    1.46
    Scale
    1.43
     scale
    1.42
    scales
    1.39
    SCALE
    1.33
    scale
    1.26
    Act Density 0.074%

    No Known Activations