INDEX
Explanations
phrases indicating classification or categorization
New Auto-Interp
Negative Logits
surla
-0.45
তথ্যসূত্র
-0.40
murale
-0.39
domés
-0.39
Manbalar
-0.38
aveug
-0.38
⋙
-0.34
Ness
-0.34
Cosmetic
-0.34
AnchorStyles
-0.34
POSITIVE LOGITS
sorta
0.80
Kinda
0.77
somewhat
0.76
kinda
0.72
Somewhat
0.69
Somewhat
0.68
Kinda
0.68
styleable
0.66
kinda
0.64
somewhat
0.63
Activations Density 0.176%