INDEX
    Explanations

    expressions of frustration or disappointment

    Responses to questions/statements

    New Auto-Interp
    Negative Logits
    ]),
    
    -1.13
    ".
    
    -1.05
    '),
    
    -1.00
    "),
    
    -1.00
     ")");
    -0.97
    %");
    -0.97
    "},
    
    -0.96
    ")));
    
    -0.96
     $_"
    -0.96
    "],
    
    -0.94
    POSITIVE LOGITS
     yeah
    0.98
     maybe
    0.94
    ...
    0.92
     sorry
    0.92
     Maybe
    0.90
     haha
    0.90
    ....
    0.89
     :)
    0.87
     gonna
    0.86
     you
    0.86
    Act Density 0.325%

    No Known Activations