INDEX
Explanations
phrases that indicate quality or satisfaction
New Auto-Interp
Negative Logits
elect
-0.17
yms
-0.17
elig
-0.17
yll
-0.17
ettes
-0.17
yen
-0.16
yonel
-0.16
Ø´ÙĨ
-0.16
igue
-0.16
ein
-0.16
POSITIVE LOGITS
-known
0.30
spring
0.28
ington
0.28
ows
0.26
come
0.22
-being
0.22
-rounded
0.20
l
0.20
INGTON
0.20
inger
0.19
Activations Density 0.072%