INDEX
Explanations
phrases indicating inclusion or exclusion of specific entities
phrases that express inclusion or emphasize exceptions
New Auto-Interp
Negative Logits
estyles
-0.78
ahime
-0.73
ulations
-0.68
ript
-0.67
oren
-0.66
format
-0.65
bis
-0.65
iken
-0.65
ntax
-0.64
brace
-0.64
POSITIVE LOGITS
myself
1.38
ourselves
1.34
yourselves
1.27
oneself
1.26
yourself
1.22
themselves
1.17
politicians
1.16
strangers
1.14
bureaucrats
1.12
outsiders
1.12
Activations Density 0.303%