INDEX
Explanations
references to the concept of "use" in various contexts
New Auto-Interp
Negative Logits
theless
-0.21
ships
-0.20
ly
-0.19
ship
-0.18
ness
-0.18
rim
-0.17
useful
-0.17
usual
-0.17
raz
-0.17
ishly
-0.16
POSITIVE LOGITS
fully
0.41
age
0.40
full
0.39
fulness
0.37
ful
0.36
able
0.33
FUL
0.32
lessly
0.30
ages
0.25
AGE
0.24
Activations Density 0.083%