INDEX
    Explanations

    concepts related to societal issues and responsibility

    Word followed by punctuation or a function word

    New Auto-Interp
    Negative Logits
    */;
    -0.91
    enumii
    -0.89
    ]--;
    -0.85
    }\]
    -0.84
    ()")
    -0.83
    %");
    -0.82
    )";
    
    -0.80
    |}{$
    -0.80
    ()");
    -0.77
    $")
    -0.76
    POSITIVE LOGITS
     ftw
    1.12
     FTW
    1.09
     anyone
    1.03
     galore
    1.00
     indeed
    0.95
    !
    0.93
    ?
    0.90
     anybody
    0.85
     yes
    0.81
    ?!
    0.76
    Act Density 0.372%

    No Known Activations