INDEX
Explanations
expressions of kindness and generosity
New Auto-Interp
Negative Logits
createState
-0.73
ⓧ
-0.73
endpush
-0.65
verhe
-0.58
nere
-0.56
superiori
-0.56
réve
-0.56
bodem
-0.56
Temptation
-0.55
modernization
-0.54
POSITIVE LOGITS
kindness
1.31
kindness
1.14
compassionate
1.09
Kindness
1.07
compassion
1.05
Compassion
1.02
generosity
0.95
warm
0.95
charitable
0.92
altru
0.92
Activations Density 0.234%