INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    etc
    0.96
    py
    0.91
    T
    0.87
    ag
    0.85
    dr
    0.85
    f
    0.85
    L
    0.84
    dis
    0.84
    von
    0.83
    v
    0.83
    POSITIVE LOGITS
     sided
    1.18
     hearted
    1.14
     centric
    1.13
     inducing
    1.06
     minded
    1.05
     durée
    1.00
     Oriented
    0.99
     cuddling
    0.98
     orientated
    0.98
     indignation
    0.96
    Act Density 0.128%

    No Known Activations