INDEX
    Explanations

    consciousness

    New Auto-Interp
    Negative Logits
    allows
    -0.07
    inh
    -0.07
     items
    -0.06
    coupon
    -0.06
    -0.06
    ectors
    -0.06
     forgive
    -0.06
    oin
    -0.06
     mouth
    -0.06
    condition
    -0.06
    POSITIVE LOGITS
    0.07
     }));↵
    0.07
     yc
    0.07
    alleries
    0.07
    entropy
    0.06
     갤로그
    0.06
     대부분
    0.06
    .contrib
    0.06
    ]});↵
    0.06
    ';';
    0.06
    Act Density 0.029%

    No Known Activations