INDEX
    Explanations

    phrases related to one-on-one interactions

    New Auto-Interp
    Negative Logits
    iris
    -0.16
     Fifth
    -0.14
    第
    -0.14
    419
    -0.14
    xbb
    -0.14
    ÅĻes
    -0.14
    utsch
    -0.13
    isl
    -0.13
    078
    -0.13
     Seventh
    -0.13
    POSITIVE LOGITS
    won
    0.30
     oe
    0.29
     onc
    0.29
    Won
    0.28
    -one
    0.27
     Won
    0.27
     Onc
    0.27
     won
    0.27
    ìĽIJ
    0.25
     on
    0.24
    Act Density 0.046%

    No Known Activations