INDEX
Explanations
phrases indicating product features, specifications, and comparisons in various contexts
New Auto-Interp
Negative Logits
ibri
-0.15
hower
-0.15
εÏĤ
-0.14
پس
-0.14
/DD
-0.14
instances
-0.14
Allowed
-0.14
pton
-0.13
oothing
-0.13
ku
-0.13
POSITIVE LOGITS
type
0.26
kind
0.24
kinds
0.22
exact
0.22
type
0.21
ç±»åŀĭ
0.21
jenis
0.21
exact
0.20
tipo
0.19
.kind
0.19
Activations Density 0.152%