INDEX
Explanations
phrases that express categorization or qualification
New Auto-Interp
Negative Logits
surla
-0.45
HasFactory
-0.40
Manbalar
-0.40
pinulongan
-0.39
AnchorStyles
-0.39
thermique
-0.37
Билгалдахарш
-0.36
+#+#
-0.36
kasarigan
-0.35
তথ্যসূত্র
-0.34
POSITIVE LOGITS
Kinda
0.77
sorta
0.73
Kinda
0.73
kinda
0.65
Somewhat
0.64
Somewhat
0.64
kinda
0.63
なんとなく
0.61
somewhat
0.61
quasi
0.59
Activations Density 0.188%