INDEX
Explanations
concepts related to virtue and moral character
New Auto-Interp
Negative Logits
gili
-0.17
ź
-0.16
имÑĥ
-0.15
CMP
-0.15
å¯
-0.14
å¾Ħ
-0.14
adoo
-0.14
erness
-0.14
acters
-0.14
ÏĢει
-0.13
POSITIVE LOGITS
kabil
0.17
fully
0.16
-null
0.16
edef
0.15
egl
0.15
avir
0.14
pets
0.14
758
0.14
218
0.14
ove
0.14
Activations Density 0.003%