INDEX
Explanations
verbs related to making decisions or taking actions
phrases that express intentions or goals
New Auto-Interp
Negative Logits
Else
-0.63
Canary
-0.62
Mens
-0.61
Relief
-0.61
Pak
-0.61
Appropriations
-0.58
Plus
-0.58
Cosponsors
-0.58
casualty
-0.57
afety
-0.57
POSITIVE LOGITS
accomplish
1.15
emulate
1.08
achieve
1.06
eradicate
0.91
avoid
0.90
eliminate
0.90
attain
0.86
solve
0.86
pursue
0.85
improve
0.85
Activations Density 0.123%