INDEX
Explanations
terms related to consequences and punishment
New Auto-Interp
Negative Logits
__":
-0.73
__':
-0.70
principalTable
-0.67
ComVisible
-0.63
__":
-0.62
__':
-0.61
ViewImports
-0.58
ویکیپدیا
-0.58
httphttps
-0.57
@"/
-0.56
POSITIVE LOGITS
consequences
0.99
Consequences
0.83
Consequences
0.75
benefits
0.74
repercussions
0.74
rewards
0.70
consequence
0.70
ramifications
0.70
后果
0.65
stakes
0.64
Activations Density 0.471%