INDEX
    Explanations

    phrases that indicate conflict or truthfulness in arguments

    New Auto-Interp
    Negative Logits
    utz
    -0.15
    ale
    -0.14
     Stanton
    -0.14
    .intellij
    -0.14
    .mvc
    -0.13
     deduction
    -0.13
    qu
    -0.13
     Unt
    -0.13
     пи
    -0.13
    Joined
    -0.12
    POSITIVE LOGITS
    еÑĤелÑĮ
    0.17
    emmel
    0.16
    edo
    0.16
    oire
    0.16
    usher
    0.16
    emm
    0.15
    Ãłm
    0.15
     ones
    0.15
     Entr
    0.15
    /effects
    0.14
    Act Density 0.193%

    No Known Activations