INDEX
    Explanations

    titles, particularly those with the word "The" in them, or words related to black people

    New Auto-Interp
    Negative Logits
    <bos>
    -1.11
    RegressionTest
    -0.66
    ')):
    -0.65
    "}>
    -0.59
    '):
    
    -0.59
    $")
    -0.58
    ")){
    
    -0.57
     nemlig
    -0.57
    Guys
    -0.57
    '})
    -0.57
    POSITIVE LOGITS
     Ruhm
    0.62
     joaat
    0.60
     энциклопедия
    0.60
     Peasant
    0.59
     EdgeInsets
    0.59
    EndContext
    0.58
     henvisninger
    0.58
     adultery
    0.58
    pertory
    0.57
     interrogation
    0.56
    Act Density 1.463%

    No Known Activations