INDEX
    Explanations

    self-referential phrases and discussions about personal identity

    New Auto-Interp
    Negative Logits
    éľŀ
    -0.14
    ivent
    -0.14
    oji
    -0.14
    veau
    -0.14
    lix
    -0.14
     TMPro
    -0.14
    hwnd
    -0.13
    á»
    -0.13
    δά
    -0.13
    llib
    -0.13
    POSITIVE LOGITS
     love
    0.41
     LOVE
    0.36
     likes
    0.33
    love
    0.33
     loves
    0.32
     prefer
    0.31
    likes
    0.30
     tend
    0.29
     Love
    0.28
    åĸľæ¬¢
    0.28
    Act Density 0.800%

    No Known Activations