INDEX
Explanations
comparisons between different groups or entities
comparative statements regarding rates or advantages in health and social science contexts
New Auto-Interp
Negative Logits
enthusi
-0.65
mathemat
-0.64
treaties
-0.63
Pok
-0.62
ovember
-0.62
Travels
-0.61
translator
-0.60
lia
-0.59
embassies
-0.57
revolutions
-0.57
POSITIVE LOGITS
otherwise
0.95
placebo
0.90
passively
0.84
None
0.83
nons
0.80
merely
0.79
opposite
0.79
null
0.77
unaffected
0.75
nond
0.75
Activations Density 0.292%