INDEX
Explanations
phrases and terms related to classification and categorization
New Auto-Interp
Negative Logits
ingle
-0.19
INGLE
-0.17
-transitional
-0.17
{{--<-0.16
ullet
-0.16
extinction
-0.15
ffect
-0.14
Naj
-0.14
BED
-0.14
åij¼
-0.14
POSITIVE LOGITS
ova
0.15
062
0.15
Twice
0.14
rapy
0.14
entar
0.13
div
0.13
abox
0.13
лава
0.13
087
0.13
aire
0.13
Activations Density 0.006%