INDEX
Explanations
instances of altruistic actions or helpful behaviors
expressions related to assistance and support
New Auto-Interp
Negative Logits
ovember
-0.89
Available
-0.81
Stretch
-0.77
idan
-0.76
appropriately
-0.75
Said
-0.71
Rem
-0.70
Especially
-0.70
Get
-0.70
shed
-0.69
POSITIVE LOGITS
bike
0.88
commerce
0.83
physic
0.83
bicycle
0.79
automobile
0.79
money
0.79
automobiles
0.76
human
0.75
transportation
0.75
person
0.75
Activations Density 0.577%