INDEX
    Explanations

    comparative phrases indicating quality or similarity

    New Auto-Interp
    Negative Logits
    ho
    -0.18
    ha
    -0.15
    ant
    -0.14
     Editors
    -0.14
    ©
    -0.14
     Leer
    -0.14
    615
    -0.13
    ÏĦÏĥ
    -0.13
     ander
    -0.13
    ingo
    -0.13
    POSITIVE LOGITS
     than
    0.19
    -than
    0.18
     THAN
    0.16
    Than
    0.15
    Ú©Ùĩ
    0.15
    ulary
    0.15
    ThanOrEqualTo
    0.14
    than
    0.14
    ething
    0.14
    _than
    0.14
    Act Density 0.010%

    No Known Activations