INDEX
    Explanations

    harmful thoughts or urges

    New Auto-Interp
    Negative Logits
     permeates
    0.52
     vanishes
    0.46
     включает
    0.46
     является
    0.45
     encompasses
    0.44
     ہونا
    0.44
     применяется
    0.44
     itself
    0.43
     varies
    0.43
     disappears
    0.43
    POSITIVE LOGITS
     want
    0.98
     have
    0.97
    觉得
    0.90
    覺得
    0.84
     wanted
    0.79
     know
    0.79
     knew
    0.77
     merasa
    0.77
    have
    0.76
     feel
    0.73
    Act Density 0.015%

    No Known Activations