INDEX
Explanations
words related to categorization and classification
New Auto-Interp
Negative Logits
ially
-0.07
oog
-0.06
ALLY
-0.06
ayan
-0.05
omorphic
-0.05
gas
-0.05
che
-0.05
branch
-0.05
kh
-0.05
gymn
-0.05
POSITIVE LOGITS
kening
0.08
ØŃات
0.08
imento
0.07
Ñĩив
0.07
Loose
0.07
inkel
0.07
ecast
0.07
ingen
0.07
kan
0.07
ulong
0.07
Activations Density 0.043%