INDEX
    Explanations

    patterns and repetitions in descriptive phrases

    New Auto-Interp
    Negative Logits
    obi
    -0.16
    egov
    -0.15
    çĽijåIJ¬é¡µéĿ¢
    -0.15
    prime
    -0.15
    uy
    -0.15
    andas
    -0.15
    263
    -0.15
    entes
    -0.14
    ysz
    -0.14
    itous
    -0.14
    POSITIVE LOGITS
    ruž
    0.18
    umann
    0.16
    ver
    0.16
    æºIJ
    0.16
     source
    0.15
     Source
    0.15
    ños
    0.14
     same
    0.14
    awei
    0.14
    same
    0.14
    Act Density 0.100%

    No Known Activations