INDEX
    Explanations

    references to societal norms and their impact on behavior and morality

    New Auto-Interp
    Negative Logits
     matel
    -0.57
     ujednoznacz
    -0.55
    __":
    -0.50
    enderror
    -0.48
     Remover
    -0.47
    ónimos
    -0.47
     bạch
    -0.47
     understatement
    -0.46
     tricot
    -0.46
    __':
    -0.46
    POSITIVE LOGITS
     unravel
    0.78
     collapsed
    0.78
     nose
    0.77
     tank
    0.77
     imp
    0.76
    BeginInit
    0.76
     fal
    0.75
     collapsing
    0.73
     collapse
    0.72
     crumbled
    0.72
    Act Density 0.399%

    No Known Activations