INDEX
    Explanations

    terms related to physical struggle or conflict

    New Auto-Interp
    Negative Logits
    ç£
    -0.16
    wayne
    -0.15
    .Label
    -0.15
    (Layout
    -0.15
    imizi
    -0.14
    argent
    -0.14
    imizin
    -0.14
    POSITE
    -0.13
    533
    -0.13
     LinkedIn
    -0.13
    POSITIVE LOGITS
    led
    0.70
    les
    0.63
    ling
    0.61
    le
    0.57
    ler
    0.55
    lers
    0.50
    li
    0.50
    let
    0.49
    lo
    0.48
    la
    0.47
    Act Density 0.184%

    No Known Activations