INDEX
    Explanations

    phrases and words related to knowledge and awareness

    New Auto-Interp
    Negative Logits
    OLLOW
    -0.16
    chyb
    -0.16
    .scalablytyped
    -0.15
    umi
    -0.15
    gnu
    -0.15
    _UNUSED
    -0.15
    regon
    -0.14
    jure
    -0.14
    ungen
    -0.14
    ä¸įè¦ģ
    -0.14
    POSITIVE LOGITS
     zero
    0.48
     absolutely
    0.42
     little
    0.42
     ZERO
    0.39
    zero
    0.39
     Zero
    0.39
    little
    0.37
    -zero
    0.37
    Zero
    0.36
    _zero
    0.34
    Act Density 0.198%

    No Known Activations