INDEX
    Explanations

    descriptions and explanations

    content-bearing words (informational/descriptive tokens) that appear in expository or factual passages.

    New Auto-Interp
    Negative Logits
    -0.06
    ロー
    -0.06
    oods
    -0.06
     côt
    -0.06
     Alberto
    -0.06
    voice
    -0.05
     blobs
    -0.05
    edeki
    -0.05
     kvinde
    -0.05
    єш
    -0.05
    POSITIVE LOGITS
    ---
    ↵
    0.07
    無しさん
    0.07
    .UTF
    0.07
    ighbour
    0.07
     disclosing
    0.07
    enci
    0.07
     advising
    0.06
     related
    0.06
     Jong
    0.06
    ็นต
    0.06
    Act Density 4.470%

    No Known Activations