INDEX
Explanations
phrases that start with "While" followed by a high activation word or phrase
conditional phrases that introduce contrasting or qualifying statements
New Auto-Interp
Negative Logits
rium
-0.80
atron
-0.78
aer
-0.70
elled
-0.70
illet
-0.69
isable
-0.69
tnc
-0.69
enter
-0.68
omet
-0.67
ulnerable
-0.67
POSITIVE LOGITS
acknowledging
1.12
browsing
0.96
researching
0.95
conced
0.91
discussing
0.89
agreeing
0.83
respecting
0.82
dismissing
0.79
mentioning
0.79
commenting
0.79
Activations Density 0.050%