INDEX
Explanations
phrases or statements related to cause and effect
concepts related to the impact of actions and beliefs on outcomes
New Auto-Interp
Negative Logits
ħĭ
-0.82
etheless
-0.72
çͰ
-0.70
oute
-0.68
WithNo
-0.68
luaj
-0.67
osate
-0.65
etz
-0.63
Ther
-0.63
Siberian
-0.61
POSITIVE LOGITS
nurt
0.80
winners
0.76
paycheck
0.74
tangible
0.74
obey
0.74
entertain
0.74
smiles
0.73
deeds
0.72
measurable
0.70
rewards
0.70
Activations Density 1.102%