INDEX
    Explanations

    instances of violence and authority interactions in narratives

    New Auto-Interp
    Negative Logits
    411
    -0.16
    ailles
    -0.15
    emos
    -0.15
    umi
    -0.15
    ü
    -0.14
    avin
    -0.14
    izik
    -0.14
    ami
    -0.14
     bo
    -0.14
    testing
    -0.13
    POSITIVE LOGITS
    zo
    0.16
    claim
    0.16
     awareness
    0.15
    Try
    0.15
    à¸ŀย
    0.15
    cken
    0.15
     getaway
    0.15
    try
    0.14
     backups
    0.14
    ult
    0.14
    Act Density 0.015%

    No Known Activations