INDEX
    Explanations

    phrases indicating knowledge or understanding

    New Auto-Interp
    Negative Logits
    entes
    -0.15
    rift
    -0.15
    hood
    -0.15
    IFI
    -0.15
    ype
    -0.15
    quist
    -0.14
    umps
    -0.14
    elah
    -0.14
    ublic
    -0.14
    ikes
    -0.14
    POSITIVE LOGITS
     how
    0.18
     about
    0.18
     enough
    0.17
     what
    0.15
     nothing
    0.15
    loff
    0.15
     dist
    0.14
    .Unknown
    0.14
     biết
    0.14
    basic
    0.14
    Act Density 0.095%

    No Known Activations