INDEX
    Explanations

    phrases related to intentions and morality

    New Auto-Interp
    Negative Logits
    cona
    -0.20
    roud
    -0.16
     Seks
    -0.16
    bable
    -0.15
     ellipse
    -0.14
    earch
    -0.14
    kop
    -0.14
    éĽij
    -0.14
    ese
    -0.14
    phere
    -0.14
    POSITIVE LOGITS
     McCart
    0.15
    usch
    0.15
    ather
    0.14
    .shtml
    0.14
    ubs
    0.14
    rippling
    0.14
    enus
    0.14
    402
    0.14
    á»ģ
    0.14
    羣
    0.13
    Act Density 0.247%

    No Known Activations