INDEX
    Explanations

    abstract concepts or goals

    concepts and terms related to objectives or claims

    New Auto-Interp
    Negative Logits
    ctors
    -0.67
    umbn
    -0.67
    idth
    -0.63
    utters
    -0.61
    idates
    -0.61
    headers
    -0.60
    ãĥ³ãĤ¸
    -0.60
     condem
    -0.59
     srf
    -0.59
    usha
    -0.57
    POSITIVE LOGITS
     ourselves
    0.95
     myself
    0.90
     firsthand
    0.82
    .<
    0.76
     yourself
    0.75
     unconsciously
    0.73
     vividly
    0.73
     empir
    0.72
    ality
    0.71
     manually
    0.71
    Act Density 0.240%

    No Known Activations