INDEX
    Explanations

    significant concepts related to personal or moral integrity

    New Auto-Interp
    Negative Logits
    YC
    -0.19
    ogan
    -0.19
    enis
    -0.16
    elier
    -0.15
    vince
    -0.15
     Linh
    -0.14
    δή
    -0.14
    coles
    -0.14
    meni
    -0.14
    usch
    -0.14
    POSITIVE LOGITS
     either
    0.27
     EITHER
    0.24
     Either
    0.21
    either
    0.20
    Either
    0.18
     somewhere
    0.15
    ãĥ³ãĤº
    0.15
    ek
    0.15
    ither
    0.14
    ortho
    0.14
    Act Density 0.003%

    No Known Activations