INDEX
    Explanations

    phrases that indicate the concept of beginnings or starts

    New Auto-Interp
    Negative Logits
    ola
    -0.17
    ede
    -0.16
    URA
    -0.16
    alie
    -0.16
    енÑĤÑĥ
    -0.15
    ed
    -0.15
    gren
    -0.14
    gmt
    -0.14
    ulence
    -0.14
    enton
    -0.14
    POSITIVE LOGITS
    nings
    0.38
    /end
    0.32
    -middle
    0.28
    ning
    0.24
     stages
    0.24
    NING
    0.22
    -stage
    0.21
     steps
    0.20
    /Foundation
    0.20
    ,end
    0.17
    Act Density 0.035%

    No Known Activations