INDEX
    Explanations

    words that people frequently use when chatting or being interviewed.

    New Auto-Interp
    Negative Logits
    '
    -3.13
     '
    -1.87
    '.
    -1.63
    $'
    -1.62
    )'
    -1.55
    .'
    -1.54
    '"
    -1.52
    \'
    -1.49
    }'
    -1.45
    '...
    -1.45
    POSITIVE LOGITS
    ”)
    1.05
    ”).
    1.02
    .”)
    0.98
    ”),
    0.97
    .”
    0.97
    ?”
    0.91
    ”.
    0.91
    ,”
    0.90
    ”,
    0.88
    ”]
    0.85
    Act Density 54.927%

    No Known Activations