INDEX
    Explanations

    phrases and words that indicate comparison or contrast in context

    New Auto-Interp
    Negative Logits
    ().'/
    -0.14
    anny
    -0.14
     Natasha
    -0.14
    ideas
    -0.14
     Shepard
    -0.14
    ]){
    -0.14
    412
    -0.13
    æĦıæĢĿ
    -0.13
    ÑģÑĤоÑı
    -0.13
    itime
    -0.13
    POSITIVE LOGITS
     typical
    0.22
     Typical
    0.20
    typically
    0.17
     Typically
    0.16
    _default
    0.16
    åħ¸
    0.16
    typ
    0.15
    _DEFAULT
    0.15
    .default
    0.15
     typically
    0.15
    Act Density 0.004%

    No Known Activations