INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Merz
    -0.59
    a
    -0.55
    gre
    -0.54
     Merri
    -0.53
    color
    -0.53
    Color
    -0.52
     Color
    -0.51
     Mif
    -0.50
     స
    -0.49
     color
    -0.49
    POSITIVE LOGITS
     Colors
    1.64
    Colors
    1.43
     colors
    1.40
     Colours
    1.29
     COLORS
    1.28
    colors
    1.25
     colours
    1.24
    Colours
    1.11
    COLORS
    1.11
     colores
    1.09
    Act Density 0.094%

    No Known Activations