INDEX
    Explanations

    phrases indicating uncertainty and limited knowledge about a subject

    New Auto-Interp
    Negative Logits
    ãĥªãĤ¢
    -0.16
    olle
    -0.16
    lee
    -0.14
    LEE
    -0.14
    æľĭ
    -0.13
     Shields
    -0.13
    aurus
    -0.13
    hol
    -0.13
    roit
    -0.13
    .shapes
    -0.13
    POSITIVE LOGITS
     known
    0.34
     knowledge
    0.32
     know
    0.29
    known
    0.28
    knowledge
    0.28
     Known
    0.27
    -known
    0.27
     Knowledge
    0.26
    Knowledge
    0.25
    Known
    0.25
    Act Density 0.261%

    No Known Activations