INDEX
    Explanations

    the specific term "Ass" with varying activation strengths

    variations of the term "assassin" or related terms

    New Auto-Interp
    Negative Logits
    çĦ
    -0.72
    AAF
    -0.69
    vation
    -0.68
     Tobacco
    -0.64
     Mercury
    -0.63
     WD
    -0.63
     Welsh
    -0.63
     poppy
    -0.62
     MET
    -0.62
     smoke
    -0.61
    POSITIVE LOGITS
    assin
    1.36
    ociate
    1.28
    essor
    1.27
    sembly
    1.21
    ass
    1.21
    alam
    1.16
    oci
    1.13
    ortment
    1.12
    alon
    1.09
    essment
    1.08
    Act Density 0.007%

    No Known Activations