INDEX
    Explanations

    instances of the word "official."

    New Auto-Interp
    Negative Logits
     latter
    -1.55
    obacterium
    -1.53
    oyle
    -1.45
    rose
    -1.44
    angers
    -1.39
    abeth
    -1.38
    anson
    -1.38
    ellow
    -1.34
    gado
    -1.34
    yman
    -1.33
    POSITIVE LOGITS
    dom
    1.98
    doms
    1.76
    ships
    1.73
    pieces
    1.67
     bodies
    1.56
    blems
    1.52
    endar
    1.50
    hood
    1.49
    ities
    1.46
    esses
    1.44
    Act Density 0.177%

    No Known Activations