INDEX
Explanations
extreme sentiment/controversy
New Auto-Interp
Negative Logits
desplaz
0.38
📈
0.37
״
0.36
Jupyter
0.35
COMEN
0.35
Specifically
0.35
Matem
0.34
"\
0.33
Tijdens
0.33
\|
0.33
POSITIVE LOGITS
cutest
0.43
hatred
0.42
hilarious
0.41
cute
0.41
adorable
0.39
murderous
0.38
Cute
0.37
然后再
0.36
funny
0.36
bigotry
0.36
Activations Density 0.000%