INDEX
    Explanations

    words related to legal consequences or conditions

    New Auto-Interp
    Negative Logits
    iffe
    -0.76
    isoft
    -0.67
    imet
    -0.61
    enic
    -0.61
     Carbuncle
    -0.61
    ortun
    -0.60
     Neighbor
    -0.60
    ãĤ¼
    -0.59
    anqu
    -0.59
    ainted
    -0.58
    POSITIVE LOGITS
     unequivocally
    1.07
     bluntly
    0.99
     emphatically
    0.92
     plainly
    0.91
     goodbye
    0.79
     categor
    0.75
     boldly
    0.74
     quo
    0.74
     "...
    0.74
     confidently
    0.73
    Act Density 0.011%

    No Known Activations