INDEX
    Explanations

    exclamatory phrases or emotional responses

    New Auto-Interp
    Negative Logits
    "):
    
    -1.07
    `,
    
    -1.06
    .")
    
    -1.04
    )");
    
    -1.02
    ".
    
    -1.01
    $")
    -0.98
    '),
    
    -0.93
    "},
    
    -0.93
    '):
    
    -0.92
    }")
    
    -0.92
    POSITIVE LOGITS
    !
    3.13
    !!
    2.53
    !!!
    2.47
     !
    2.31
    !!!!
    2.20
    2.19
    !)
    2.19
    !"
    2.17
    !”
    2.08
    !!!!!
    2.04
    Act Density 0.867%

    No Known Activations