INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     can
    0.91
     provide
    0.87
     
    0.85
     Internet
    0.84
     RAID
    0.84
     negotiate
    0.81
     illegal
    0.81
     explain
    0.80
     police
    0.80
     Against
    0.80
    POSITIVE LOGITS
    1.21
     любви
    1.18
    love
    1.11
    ❤️
    1.08
    💗
    1.08
    😍
    1.07
    💞
    1.03
     apaixon
    1.02
    💝
    1.02
    LOVE
    1.01
    Act Density 0.270%

    No Known Activations