INDEX
    Explanations

    phrases related to various aspects of society, such as culture, reality, work, and beauty

    terms associated with abstract concepts and societal structures

    New Auto-Interp
    Negative Logits
    ificantly
    -0.78
     Important
    -0.66
    volent
    -0.65
    orthy
    -0.63
    untarily
    -0.61
    orously
    -0.61
    Important
    -0.60
    regulated
    -0.60
    noxious
    -0.59
    isoft
    -0.59
    POSITIVE LOGITS
    ounters
    0.82
    antry
    0.81
     confines
    0.73
     afforded
    0.72
     mund
    0.71
    iences
    0.65
    smanship
    0.64
    forts
    0.63
    eers
    0.63
     of
    0.63
    Act Density 0.493%

    No Known Activations