INDEX
    Explanations

    extreme sentiment/controversy

    New Auto-Interp
    Negative Logits
     desplaz
    0.38
    📈
    0.37
    ״
    0.36
     Jupyter
    0.35
     COMEN
    0.35
     Specifically
    0.35
     Matem
    0.34
     "\
    0.33
     Tijdens
    0.33
    \|
    0.33
    POSITIVE LOGITS
     cutest
    0.43
     hatred
    0.42
     hilarious
    0.41
     cute
    0.41
     adorable
    0.39
     murderous
    0.38
    Cute
    0.37
    然后再
    0.36
    funny
    0.36
     bigotry
    0.36
    Act Density 0.000%

    No Known Activations