INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ❤️
    1.13
     zusammen
    1.11
     dans
    1.08
     Б
    1.05
     lovers
    1.03
     capsules
    1.03
     Capsules
    1.02
     за
    1.01
     За
    1.01
     с
    1.00
    POSITIVE LOGITS
    debate
    1.33
    true
    1.25
    reality
    1.11
    doesn
    1.10
    assertions
    1.06
    opinion
    1.04
    none
    1.04
    not
    1.02
    epoch
    1.01
    does
    1.01
    Act Density 0.784%

    No Known Activations