INDEX
    Explanations

    numerical references and quantities

    New Auto-Interp
    Negative Logits
     snippetHide
    -0.78
    réhen
    -0.72
    AddTagHelper
    -0.71
     eſt
    -0.71
     referenties
    -0.69
     Treue
    -0.68
    ]]
    
    -0.67
     ſch
    -0.67
     }],
    -0.67
    addGap
    -0.67
    POSITIVE LOGITS
     fucking
    1.03
    fucking
    0.90
     FUCKING
    0.88
     goddamn
    0.85
     mierda
    0.78
    fuck
    0.78
     shitty
    0.77
     fuckin
    0.77
     fuck
    0.76
     shit
    0.75
    Act Density 1.628%

    No Known Activations