INDEX
    Explanations

    terms related to virtue and virtuous behaviors

    New Auto-Interp
    Negative Logits
    itz
    -0.16
    lds
    -0.15
    prehensive
    -0.15
    lectic
    -0.15
    atters
    -0.15
     xét
    -0.15
    á»Ĩ
    -0.15
    é»
    -0.15
    ary
    -0.14
    ål
    -0.14
    POSITIVE LOGITS
    ually
    0.29
    uous
    0.22
     virt
    0.20
    ues
    0.19
    ue
    0.19
    oso
    0.17
    tual
    0.17
    utos
    0.17
    usize
    0.17
    udes
    0.17
    Act Density 0.003%

    No Known Activations