INDEX
Explanations
statements indicating the existence or condition of entities
New Auto-Interp
Negative Logits
ado
-0.71
oids
-0.69
ties
-0.68
udeau
-0.67
ths
-0.67
tendency
-0.67
rant
-0.66
penchant
-0.64
olor
-0.64
urous
-0.63
POSITIVE LOGITS
properly
1.03
able
1.01
adequately
0.93
safe
0.92
abella
0.88
Ħ¢
0.88
aware
0.88
capable
0.88
ready
0.88
correct
0.85
Activations Density 0.110%