INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     percentages
    -0.07
     uppercase
    -0.07
    ское
    -0.07
    (names
    -0.06
     [])↵↵
    -0.06
     labor
    -0.06
    (files
    -0.06
     Setter
    -0.06
    "go
    -0.06
     Jackie
    -0.06
    POSITIVE LOGITS
     reveals
    0.12
     reveal
    0.11
     revealed
    0.09
     revealing
    0.08
     confirmed
    0.07
     reve
    0.07
    enin
    0.07
     сообщ
    0.07
     disclosing
    0.07
     revised
    0.07
    Act Density 0.012%

    No Known Activations