INDEX
    Explanations

    words ending in y or ergy

    New Auto-Interp
    Negative Logits
    ivity
    0.68
    speople
    0.66
    swith
    0.59
    ानंतर
    0.59
    ामध्ये
    0.57
    LY
    0.53
    ాన్ని
    0.52
    IVITY
    0.51
    ामुळे
    0.51
    ssss
    0.49
    POSITIVE LOGITS
    yyyy
    0.88
    yyyyyyyy
    0.88
    ுள்ளார்
    0.73
    yy
    0.72
    ுள்ளனர்
    0.64
    ء
    0.57
    ுள்ள
    0.55
    outube
    0.53
    й
    0.52
    ٰ
    0.52
    Act Density 0.458%

    No Known Activations