INDEX
Explanations
phrases related to giving rewards or consequences based on actions and decisions
instances of personal empowerment and individual responsibility
New Auto-Interp
Negative Logits
earch
-0.76
oly
-0.71
enne
-0.70
scroll
-0.70
Favorite
-0.68
UGE
-0.68
Scroll
-0.68
Feature
-0.67
Flight
-0.66
gran
-0.64
POSITIVE LOGITS
unwittingly
1.41
inadvertently
1.37
implicitly
1.30
thereby
1.23
indirectly
1.22
tacit
1.17
depri
1.16
unintentionally
1.16
legitim
1.11
perpet
1.06
Activations Density 0.259%