INDEX
Explanations
phrases related to the potential impact of actions on different aspects of society or the economy
conjunctions and phrases indicating causation or condition
New Auto-Interp
Negative Logits
wic
-0.90
married
-0.77
lass
-0.77
DragonMagazine
-0.74
estern
-0.73
lication
-0.70
ached
-0.69
aws
-0.68
Latest
-0.68
ials
-0.68
POSITIVE LOGITS
thereby
1.18
consequently
1.04
hence
0.98
thus
0.98
promotes
0.97
reduces
0.92
enhances
0.92
reduce
0.90
therefore
0.89
prevents
0.89
Activations Density 0.282%