INDEX
    Explanations

    questions or uncertainty about choices and conditions

    New Auto-Interp
    Negative Logits
    ä¸įäºĨ
    -0.19
    him
    -0.18
     ним
    -0.16
     herself
    -0.16
    Them
    -0.15
    ä¸įåΰ
    -0.15
     NEVER
    -0.15
    ed
    -0.15
     eux
    -0.15
     asla
    -0.14
    POSITIVE LOGITS
    /how
    0.56
     there
    0.50
     they
    0.50
     it
    0.46
     we
    0.42
     anyone
    0.40
     anybody
    0.38
    /if
    0.37
     anything
    0.34
    there
    0.34
    Act Density 0.084%

    No Known Activations