INDEX
    Explanations

    contrasting phrases that indicate a shift in perspective

    New Auto-Interp
    Negative Logits
    orgia
    -0.16
    aston
    -0.14
    rai
    -0.14
    csr
    -0.14
    gia
    -0.14
    eÄį
    -0.14
    -Token
    -0.14
    rig
    -0.14
    ustum
    -0.14
    /ajax
    -0.14
    POSITIVE LOGITS
     merely
    0.18
    онÑĥ
    0.17
    ala
    0.17
    meer
    0.16
    zen
    0.16
    ãĥIJãĥ¼
    0.15
     nor
    0.15
    ts
    0.15
     ones
    0.15
    ajar
    0.15
    Act Density 0.031%

    No Known Activations