INDEX
    Explanations

    expressions of love and enjoyment towards various subjects

    expressing strong positive feeling

    New Auto-Interp
    Negative Logits
    -0.45
     счет
    -0.43
     Sycamore
    -0.42
    tabPage
    -0.42
    ]")]
    -0.41
    -0.40
    asu
    -0.40
    TagMode
    -0.40
     treason
    -0.40
     счёт
    -0.40
    POSITIVE LOGITS
     loved
    1.52
     Loved
    1.48
    Loved
    1.44
    loved
    1.41
     LOVED
    1.30
     liked
    1.20
    liked
    1.02
     Liked
    0.99
    Liked
    0.97
    LookAnd
    0.81
    Act Density 0.003%

    No Known Activations