INDEX
Explanations
abstract concepts and qualities
New Auto-Interp
Negative Logits
oufl
0.27
researches
0.26
berbagai
0.25
هایی
0.24
informations
0.23
的一些
0.23
Technologies
0.22
olyan
0.22
интересу
0.22
<unused2121>
0.22
POSITIVE LOGITS
diplomacy
0.49
physicality
0.49
activism
0.46
practicality
0.45
politics
0.44
philanthropy
0.43
togetherness
0.43
bureaucracy
0.43
athleticism
0.43
idealism
0.43
Activations Density 2.014%