INDEX
Explanations
concepts related to emotions and interpersonal trust
New Auto-Interp
Negative Logits
ácil
-0.15
utc
-0.15
utto
-0.14
æ±
-0.14
hell
-0.14
erte
-0.14
ipeline
-0.14
æ´¥
-0.14
uto
-0.14
mino
-0.14
POSITIVE LOGITS
misplaced
0.35
justified
0.33
warranted
0.28
founded
0.22
deserved
0.22
justify
0.22
justification
0.22
-founded
0.21
unf
0.20
arrant
0.19
Activations Density 0.206%