INDEX
Explanations
words related to lack of constraint or restriction
New Auto-Interp
Negative Logits
++++
-0.75
issance
-0.73
aurus
-0.71
ococ
-0.71
oleon
-0.71
IENCE
-0.71
Hilton
-0.68
ilee
-0.68
ior
-0.67
psey
-0.67
POSITIVE LOGITS
leaf
1.13
weed
0.96
cloth
0.93
nesses
0.91
ness
0.87
coupling
0.87
luster
0.83
fitting
0.82
cannon
0.82
lings
0.82
Activations Density 0.007%