INDEX
Explanations
instances of altruistic actions or good deeds
New Auto-Interp
Negative Logits
redients
-0.78
rough
-0.64
preliminary
-0.63
basics
-0.61
ellow
-0.57
Rough
-0.56
ications
-0.55
Extras
-0.55
Upton
-0.55
ils
-0.53
POSITIVE LOGITS
never
2.09
never
2.06
NEVER
1.96
Never
1.87
Never
1.79
ALWAYS
1.69
always
1.68
always
1.65
ever
1.58
Always
1.50
Activations Density 0.208%