INDEX
Explanations
text related to specific characteristics or qualities
mention of characteristics or traits in various contexts
New Auto-Interp
Negative Logits
aii
-0.83
NAS
-0.73
kos
-0.72
atorial
-0.71
endar
-0.69
ILLE
-0.69
EMS
-0.68
ADS
-0.67
psc
-0.67
heed
-0.65
POSITIVE LOGITS
characteristics
1.09
traits
1.05
istics
0.92
qualities
0.84
properties
0.80
Features
0.79
ively
0.79
attributes
0.78
weaknesses
0.71
natureconservancy
0.71
Activations Density 0.030%