INDEX
Explanations
terms related to specific attributes or features
references to characteristics
New Auto-Interp
Negative Logits
NAS
-0.74
GS
-0.73
kos
-0.68
arger
-0.68
EMS
-0.68
gur
-0.67
aii
-0.67
×Ķ
-0.67
ski
-0.66
BSD
-0.65
POSITIVE LOGITS
characteristics
1.06
istics
1.06
traits
0.98
Features
0.85
ively
0.84
eatures
0.83
charact
0.83
similarities
0.81
guiActiveUn
0.80
qualities
0.79
Activations Density 0.012%