INDEX
    Explanations

    phrases indicating contrast or exceptions in discussions

    New Auto-Interp
    Negative Logits
     behalf
    -0.16
    abyrin
    -0.13
    ãģĦãĤĭ
    -0.13
    up
    -0.13
    BF
    -0.13
     pož
    -0.13
     Gord
    -0.13
    ạp
    -0.13
    ãģĭãĤĬ
    -0.13
    zone
    -0.13
    POSITIVE LOGITS
     aside
    0.23
    aside
    0.22
     Aside
    0.21
     Apart
    0.19
     apart
    0.19
    Apart
    0.18
    Aside
    0.17
    ought
    0.17
    jÅ¡ÃŃ
    0.17
    rics
    0.15
    Act Density 0.020%

    No Known Activations