INDEX
Explanations
phrases expressing types or categories
New Auto-Interp
Negative Logits
htë
-0.45
васто
-0.44
سد
-0.43
spalle
-0.42
douard
-0.42
MockBean
-0.41
Cardiff
-0.40
utches
-0.40
lava
-0.40
igraf
-0.40
POSITIVE LOGITS
kind
1.82
kind
1.74
Kind
1.68
KIND
1.68
Kind
1.61
KIND
1.49
kinds
1.43
kinds
1.35
Kinds
1.23
sort
1.18
Activations Density 0.085%