INDEX
    Explanations

    discussing ethical relationships and transformation

    New Auto-Interp
    Negative Logits
    Romantic
    0.48
    romantic
    0.41
    💏
    0.40
     Romantic
    0.39
     legality
    0.39
     romantic
    0.39
     porn
    0.39
    合法
    0.39
    Paths
    0.38
     bloodshed
    0.38
    POSITIVE LOGITS
     overcoming
    0.64
     overcame
    0.63
     overcome
    0.60
     overcomes
    0.56
    克服
    0.55
     positive
    0.55
     interesting
    0.53
     преодо
    0.51
     friendly
    0.50
     solving
    0.50
    Act Density 0.122%

    No Known Activations