INDEX
Explanations
the word "not" with emphasis
phrases emphasizing negation or rejection
New Auto-Interp
Negative Logits
å¥
-0.81
cano
-0.72
çļ
-0.70
alez
-0.69
kamp
-0.67
ously
-0.67
ounters
-0.67
peaks
-0.66
iox
-0.65
ãĤ¼
-0.65
POSITIVE LOGITS
gonna
1.21
afraid
1.16
kidding
1.15
interested
1.14
bothered
1.14
necessarily
1.12
ashamed
1.07
aware
1.04
epad
1.02
amused
1.02
Activations Density 0.112%