INDEX
    Explanations

    phrases that indicate comparisons or contrasting ideas

    New Auto-Interp
    Negative Logits
    inous
    -0.15
    TECTED
    -0.14
    .xhtml
    -0.14
    tes
    -0.13
     (),↵
    -0.13
    uzu
    -0.13
    à¸ļาย
    -0.13
    ardless
    -0.13
    üh
    -0.13
    atcher
    -0.13
    POSITIVE LOGITS
     latter
    1.61
     Latter
    0.72
     former
    0.61
     later
    0.58
    former
    0.49
     Later
    0.45
    later
    0.44
     Former
    0.44
    Later
    0.43
    Former
    0.41
    Act Density 0.287%

    No Known Activations