INDEX
    Explanations

    adjectives and adverbial phrases that describe characteristics or behaviors

    New Auto-Interp
    Negative Logits
    unger
    -0.16
     citizen
    -0.15
    uner
    -0.14
    viewer
    -0.14
    essenger
    -0.14
    rray
    -0.13
    celik
    -0.13
    ileo
    -0.13
    lever
    -0.13
     Rays
    -0.13
    POSITIVE LOGITS
     TORT
    0.16
    endez
    0.15
    پس
    0.15
    æ¾
    0.14
    chy
    0.14
     Pall
    0.14
    æĸ¯çī¹
    0.14
    nell
    0.14
     gord
    0.14
    ulu
    0.14
    Act Density 0.004%

    No Known Activations