INDEX
Explanations
references to the concept of "use" in various contexts
New Auto-Interp
Negative Logits
ly
-0.21
dy
-0.20
lain
-0.20
theless
-0.19
ness
-0.18
ë²Ī
-0.16
sch
-0.16
ishly
-0.16
shan
-0.16
shot
-0.15
POSITIVE LOGITS
full
0.37
fulness
0.36
age
0.33
ful
0.31
able
0.28
fully
0.28
ability
0.27
FUL
0.24
AGE
0.24
conds
0.23
Activations Density 0.031%