INDEX
    Explanations

    personal expressions of preference or opinion

    New Auto-Interp
    Negative Logits
    견
    -0.16
    ãĤ¤ãĥī
    -0.15
    het
    -0.14
    oji
    -0.14
    ITE
    -0.14
    marvin
    -0.14
    éĢł
    -0.13
     kvinne
    -0.13
    trak
    -0.13
    ãĤ¦ãĤ©
    -0.13
    POSITIVE LOGITS
     like
    0.24
     typically
    0.22
     personally
    0.21
     usually
    0.20
     prefer
    0.18
     recently
    0.18
     likes
    0.17
    typically
    0.17
    agr
    0.16
     find
    0.16
    Act Density 0.072%

    No Known Activations