INDEX
Explanations
quantifiers and adjectives related to quality or extent
New Auto-Interp
Negative Logits
277
-0.07
utenberg
-0.07
pps
-0.07
istik
-0.07
ostel
-0.07
264
-0.06
267
-0.06
958
-0.06
STS
-0.06
bam
-0.06
POSITIVE LOGITS
zee
0.07
द
0.06
cate
0.06
δε
0.06
publicity
0.06
league
0.06
eller
0.06
vant
0.06
oodle
0.06
icient
0.06
Activations Density 0.031%