INDEX
    Explanations

    phrases related to beginnings or starting points

    New Auto-Interp
    Negative Logits
    ague
    -0.15
    uida
    -0.14
    acute
    -0.14
    ево
    -0.14
    ½
    -0.14
    uil
    -0.14
    uty
    -0.14
    inez
    -0.14
    ãĥ³ãĥģ
    -0.13
    utch
    -0.13
    POSITIVE LOGITS
    ge
    0.32
    ges
    0.31
    gew
    0.28
    gesch
    0.28
    ged
    0.26
    gel
    0.26
    geb
    0.26
    gest
    0.26
    ger
    0.25
    z
    0.24
    Act Density 0.012%

    No Known Activations