INDEX
    Explanations

    concepts related to objective truths and categorical frameworks

    New Auto-Interp
    Negative Logits
    ìĿĦ
    -0.20
    ry
    -0.20
    ses
    -0.19
    Ìĥ
    -0.19
    ers
    -0.19
    ld
    -0.19
    maker
    -0.19
    soever
    -0.19
    liness
    -0.18
    ÑĩиÑĤ
    -0.18
    POSITIVE LOGITS
    pants
    0.17
     nature
    0.17
    -minded
    0.16
    amente
    0.16
    yt
    0.15
    zza
    0.15
    -destruct
    0.15
    ament
    0.15
    zion
    0.15
    vely
    0.15
    Act Density 0.150%

    No Known Activations