INDEX
    Explanations

    words related to intelligence and judgment (e.g., 'stupid', 'dumb', 'smart')

    New Auto-Interp
    Negative Logits
    riott
    -0.81
    AUT
    -0.79
    accompan
    -0.76
    APH
    -0.76
    apers
    -0.75
    ILA
    -0.74
    orthy
    -0.73
    Reviewed
    -0.70
    OHN
    -0.68
    20439
    -0.66
    POSITIVE LOGITS
    founded
    1.16
    found
    0.97
    nesses
    0.90
    ness
    0.90
    fuck
    0.89
    ly
    0.88
    est
    0.87
    asses
    0.85
    itude
    0.84
    stru
    0.84
    Act Density 0.036%

    No Known Activations