INDEX
    Explanations

    phrases that imply comparison or similarity

    New Auto-Interp
    Negative Logits
    artment
    -0.16
    ITER
    -0.15
     ("
    -0.15
    owo
    -0.15
    exion
    -0.14
     Poh
    -0.14
    leted
    -0.14
    ington
    -0.14
     dri
    -0.14
    shal
    -0.14
    POSITIVE LOGITS
     though
    0.41
    Though
    0.30
     Though
    0.30
    though
    0.27
     бÑĥдÑĤо
    0.22
     tho
    0.21
    inine
    0.17
     aunque
    0.17
     if
    0.16
    Tho
    0.15
    Act Density 0.014%

    No Known Activations