INDEX
Negative Logits
inidad
0.47
featureType
0.38
sensitivity
0.37
இல
0.37
تيب
0.36
आव
0.36
unsuccessfully
0.35
complexity
0.35
forbidden
0.35
interactivity
0.35
POSITIVE LOGITS
trust
2.33
trusted
2.19
trustworthy
2.16
trusting
2.13
Trust
2.09
Trust
2.08
trust
2.08
trustworthiness
2.02
TRUST
2.02
信頼
1.99
Activations Density 0.020%