INDEX
    Explanations

    phrases indicating disapproval or violation of rules

    Followed by "(" or "Q" (likely question)

    New Auto-Interp
    Negative Logits
     '\\;'
    -1.17
    ſelves
    -1.14
     ་་
    -1.12
    etheless
    -1.12
    dafx
    -1.12
     $_"
    -1.10
    >\<^
    -1.10
     ―――――
    -1.10
    olesale
    -1.05
    BibitemShut
    -1.04
    POSITIVE LOGITS
    <eos>
    1.19
    ↵↵
    1.05
    1.04
    ..
    0.95
    ↵↵↵
    0.94
    ...
    0.89
    </em>
    0.87
                                   
    0.86
     
    0.86
    </h2>
    0.86
    Act Density 1.435%

    No Known Activations