INDEX
    Explanations

    expressions of contradiction or contrast

    New Auto-Interp
    Negative Logits
    rk
    -0.16
    arella
    -0.15
    erk
    -0.15
    ļĮ
    -0.14
    Ìĥ
    -0.14
    eteria
    -0.14
    eres
    -0.14
    äng
    -0.14
    ello
    -0.14
    ????????????????
    -0.13
    POSITIVE LOGITS
     it
    0.21
     they
    0.20
     there
    0.18
     fact
    0.18
     Helm
    0.17
    wards
    0.17
     he
    0.16
     all
    0.15
     Fact
    0.15
     knowing
    0.15
    Act Density 0.040%

    No Known Activations