INDEX
    Explanations

    references to "ponies" and related terms

    New Auto-Interp
    Negative Logits
    ODE
    -0.15
    gne
    -0.15
    llib
    -0.15
    oq
    -0.15
    iales
    -0.15
    аÑĢод
    -0.14
    erç
    -0.14
    esk
    -0.14
    ulses
    -0.14
    eks
    -0.14
    POSITIVE LOGITS
     pon
    0.27
     Pon
    0.25
    pon
    0.24
    pons
    0.19
    emon
    0.18
    entially
    0.16
    yp
    0.16
    ymous
    0.16
     poz
    0.16
    pok
    0.15
    Act Density 0.007%

    No Known Activations