INDEX
Explanations
kindness, support, and values
New Auto-Interp
Negative Logits
palpable
0.65
dynamic
0.65
edge
0.61
young
0.60
condition
0.60
collar
0.59
tack
0.59
anchor
0.59
seal
0.59
ও
0.59
POSITIVE LOGITS
Justice
0.75
Service
0.74
Serving
0.74
Loot
0.71
贡献
0.70
Kind
0.70
幫助
0.69
Serving
0.69
善良
0.69
服務
0.68
Activations Density 0.232%