INDEX
    Explanations

    violent-related words and phrases

    New Auto-Interp
    Negative Logits
    ÄŁ
    -0.62
     guesses
    -0.61
     Tsukuyomi
    -0.59
    edin
    -0.59
     afar
    -0.57
     Paula
    -0.57
     Mb
    -0.56
    ij士
    -0.55
    omething
    -0.54
    vous
    -0.54
    POSITIVE LOGITS
    rice
    1.03
    hemat
    1.03
    hetically
    0.98
    terson
    0.98
    abase
    0.98
    hens
    0.95
    rix
    0.93
    ric
    0.93
    itudes
    0.93
    roph
    0.92
    Act Density 0.032%

    No Known Activations