INDEX
    Explanations

    phrases expressing strong criticism or disbelief

    instances of the word "nonsense" and related phrases

    New Auto-Interp
    Negative Logits
    hani
    -0.75
    redits
    -0.72
    ez
    -0.70
    irth
    -0.70
    lis
    -0.70
    ugal
    -0.69
    hold
    -0.69
    uve
    -0.69
    yer
    -0.68
    imb
    -0.65
    POSITIVE LOGITS
     nonsense
    1.11
     detector
    0.91
     excuses
    0.89
     bullshit
    0.86
     excuse
    0.83
     rubbish
    0.81
     crap
    0.79
     guiActiveUn
    0.77
    aceutical
    0.77
     blah
    0.77
    Act Density 0.027%

    No Known Activations