INDEX
Explanations
concepts related to empathy and social welfare
New Auto-Interp
Negative Logits
method
-0.49
Erreferentziak
-0.49
Colin
-0.47
Наводи
-0.46
ENCES
-0.46
model
-0.46
האם
-0.46
-0.45
círculo
-0.45
esub
-0.45
POSITIVE LOGITS
AttributeSet
0.85
pleaſure
0.81
wellbeing
0.80
welfare
0.80
sake
0.74
gainera
0.72
+#+#
0.69
脚注の使い方
0.69
purpoſe
0.67
ViewFeatures
0.67
Activations Density 0.236%