INDEX
    Explanations

    descriptors of character traits and social status

    New Auto-Interp
    Negative Logits
    rof
    -0.16
    ered
    -0.16
    æīį
    -0.15
    zon
    -0.14
    ering
    -0.14
    omor
    -0.14
    nty
    -0.14
     cara
    -0.14
    oner
    -0.14
    cope
    -0.13
    POSITIVE LOGITS
    Äįet
    0.17
    olest
    0.16
    aren
    0.14
    _ble
    0.14
    _recall
    0.14
    .pp
    0.13
    견
    0.13
    lingen
    0.13
    uebas
    0.13
    _locals
    0.13
    Act Density 0.009%

    No Known Activations