INDEX
Explanations
acts of kindness or altruism
activities related to altruism and helping others
New Auto-Interp
Negative Logits
Entered
-0.57
Eisen
-0.55
Haw
-0.54
Conc
-0.54
Construct
-0.54
Faul
-0.53
Hob
-0.52
Kab
-0.51
Dism
-0.50
Hol
-0.50
POSITIVE LOGITS
essions
0.75
Reviewer
0.69
ariat
0.69
ppelin
0.69
ciation
0.68
roit
0.68
rican
0.67
RBI
0.64
imens
0.64
inburgh
0.64
Activations Density 0.461%