INDEX
    Explanations

    phrases related to opinions or stances on various issues

    references to interpretations of moral codes and opinions about societal issues

    New Auto-Interp
    Negative Logits
    Synopsis
    -0.69
     Called
    -0.64
    xtap
    -0.64
    word
    -0.64
    cour
    -0.60
    Adds
    -0.60
    Spoiler
    -0.58
    Whe
    -0.58
    Incre
    -0.58
    Upon
    -0.58
    POSITIVE LOGITS
    merce
    0.72
    ļé
    0.70
     oneself
    0.69
    allery
    0.67
    actual
    0.64
     Fiat
    0.62
    gans
    0.62
    ohm
    0.60
     actual
    0.60
     others
    0.60
    Act Density 0.772%

    No Known Activations