INDEX
    Explanations

    phrases that indicate comparisons or similarities

    New Auto-Interp
    Negative Logits
    acco
    -0.18
    them
    -0.17
    Them
    -0.15
    IGHL
    -0.15
     eux
    -0.14
    orsi
    -0.14
     nä
    -0.14
    ãģĵãģ¨ãģ«
    -0.14
    rou
    -0.14
    обÑĢаж
    -0.13
    POSITIVE LOGITS
     they
    0.28
     there
    0.26
     it
    0.25
     something
    0.22
     someone
    0.22
    able
    0.21
     nothing
    0.21
     part
    0.20
     we
    0.19
     somebody
    0.19
    Act Density 0.051%

    No Known Activations