INDEX
    Explanations

    phrases that indicate change or transformation

    New Auto-Interp
    Negative Logits
    sez
    -0.16
     Anyone
    -0.14
    ison
    -0.14
    geme
    -0.13
    ispers
    -0.13
    Ñĸнки
    -0.13
    å¯
    -0.13
    eto
    -0.13
    ayo
    -0.13
     никÑĤо
    -0.13
    POSITIVE LOGITS
     everything
    1.38
    everything
    1.23
     Everything
    1.16
    Everything
    1.11
     tudo
    0.95
     alles
    0.87
    ä¸ĢåĪĩ
    0.77
     tutto
    0.65
     anything
    0.64
    anything
    0.58
    Act Density 0.526%

    No Known Activations