INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ul
    0.82
     I
    0.72
    ت
    0.68
    ý
    0.68
    um
    0.68
    ز
    0.68
    ال
    0.65
    áne
    0.63
    0.63
    ו
    0.61
    POSITIVE LOGITS
     utilities
    0.83
    utility
    0.82
     utility
    0.76
     Utility
    0.74
     on
    0.70
    utilities
    0.69
    ی
    0.66
     Utilities
    0.63
     sanit
    0.61
    character
    0.59
    Act Density 0.001%

    No Known Activations