INDEX
Explanations
sentiments related to altruism and social support
New Auto-Interp
Negative Logits
luv
-0.15
ulet
-0.15
ãĤ¤ãĥ¤
-0.15
amba
-0.15
pushViewController
-0.14
utron
-0.14
ÏĦÏī
-0.14
ắn
-0.13
еÑģÑĤи
-0.13
æ·
-0.13
POSITIVE LOGITS
step
0.53
stepped
0.51
step
0.47
stepping
0.45
steps
0.44
Step
0.43
Step
0.42
-step
0.42
STEP
0.39
.step
0.39
Activations Density 0.396%