INDEX
    Explanations

    mentions of the name "Alex" or its variations in the text

    New Auto-Interp
    Negative Logits
    otate
    -0.19
    pras
    -0.17
    geh
    -0.17
    iglia
    -0.16
    JECTION
    -0.16
    ly
    -0.15
    loe
    -0.14
    att
    -0.14
    ÑģÑı
    -0.14
    uche
    -0.14
    POSITIVE LOGITS
    andra
    0.31
    andro
    0.25
    andr
    0.25
    andre
    0.24
    ander
    0.22
    ei
    0.21
    anders
    0.19
    jandro
    0.18
    opoulos
    0.17
    anian
    0.17
    Act Density 0.013%

    No Known Activations