INDEX
Explanations
references to specific conditions or categories
New Auto-Interp
Negative Logits
ust
-0.21
اÙĨÙĩ
-0.19
stad
-0.17
ustum
-0.15
awn
-0.15
stead
-0.15
vet
-0.14
side
-0.14
sta
-0.14
enda
-0.14
POSITIVE LOGITS
kinds
0.20
-purpose
0.20
ç¨ĭ度
0.17
kind
0.16
;y
0.16
akin
0.16
htub
0.15
.kind
0.15
iability
0.15
/all
0.15
Activations Density 0.017%