INDEX
    Explanations

    phrases related to societal structures or moral dilemmas

    New Auto-Interp
    Negative Logits
    Ô
    -0.83
    ãĥķãĤ©
    -0.80
    ãĤ¦ãĤ¹
    -0.76
    Scale
    -0.74
    Ext
    -0.73
    prints
    -0.72
     Ext
    -0.70
    Sac
    -0.69
    âĸĵ
    -0.69
    ovember
    -0.69
    POSITIVE LOGITS
     person
    1.23
     girl
    1.16
     guy
    1.15
     woman
    1.12
     man
    1.07
    guy
    0.99
     persons
    0.98
     Person
    0.97
     lady
    0.96
     spouse
    0.96
    Act Density 0.632%

    No Known Activations