INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -panel
    -0.07
     zs
    -0.07
     Farrell
    -0.06
    ---↵↵
    -0.06
    Arr
    -0.06
    คอม
    -0.06
     Nash
    -0.06
     respectfully
    -0.06
     Guest
    -0.06
    温度
    -0.06
    POSITIVE LOGITS
     like
    0.14
     LIKE
    0.12
     Like
    0.12
    like
    0.11
    Like
    0.11
    -like
    0.10
     unlike
    0.09
    Love
    0.08
     comme
    0.08
    LIKE
    0.07
    Act Density 0.059%

    No Known Activations