INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ate
    0.76
    						
    0.73
    л
    0.72
    ================
    0.66
    AP
    0.60
    -
    0.60
    				
    0.60
    <unused60>
    0.59
    s
    0.59
    ----------------
    0.58
    POSITIVE LOGITS
     lighten
    0.96
    н
    0.89
     lightest
    0.88
    0.87
     light
    0.86
     nhàng
    0.84
    0.84
     Light
    0.79
    Light
    0.78
    hearted
    0.77
    Act Density 0.055%

    No Known Activations