INDEX
Explanations
attributes related to authenticity and properness
New Auto-Interp
Negative Logits
ire
-0.15
éĨĴ
-0.14
lon
-0.14
端
-0.13
etto
-0.13
ä¿¡
-0.13
Hath
-0.13
redd
-0.13
additional
-0.13
resil
-0.13
POSITIVE LOGITS
æŁĦ
0.15
proper
0.15
onen
0.15
ayne
0.15
itesse
0.15
.unregister
0.14
dy
0.14
auc
0.14
addir
0.14
enance
0.14
Activations Density 0.342%