INDEX
    Explanations

    phrases that convey duality and contrasts

    New Auto-Interp
    Negative Logits
    нин
    -0.15
    itore
    -0.15
    illin
    -0.14
    hamster
    -0.14
    olin
    -0.14
    urator
    -0.14
    otine
    -0.14
    го
    -0.14
    deg
    -0.14
     Fox
    -0.13
    POSITIVE LOGITS
     Advisors
    0.16
     gul
    0.16
    egend
    0.14
     suitable
    0.14
    'gc
    0.13
    Äįin
    0.13
    uka
    0.13
    uga
    0.13
     Vas
    0.13
    _Zero
    0.13
    Act Density 0.007%

    No Known Activations