INDEX
    Explanations

    references to heroes and heroic figures

    New Auto-Interp
    Negative Logits
     addslashes
    -0.16
    ment
    -0.16
    roje
    -0.14
    azioni
    -0.14
    irector
    -0.14
    isseur
    -0.14
    kart
    -0.14
    ÑĬ
    -0.14
    gang
    -0.14
    raj
    -0.14
    POSITIVE LOGITS
    ines
    0.19
    ics
    0.18
    ically
    0.17
    lix
    0.17
    ized
    0.17
    ism
    0.17
    izable
    0.16
    ine
    0.16
    оÑģÑĢед
    0.15
    avirus
    0.15
    Act Density 0.029%

    No Known Activations