INDEX
Explanations
terms and phrases related to knowledge and learning
New Auto-Interp
Negative Logits
fty
-0.17
uko
-0.17
/group
-0.16
ondheim
-0.16
phen
-0.15
ny
-0.15
uth
-0.15
istol
-0.15
ross
-0.14
loh
-0.14
POSITIVE LOGITS
base
0.36
base
0.34
ably
0.30
-base
0.30
Base
0.28
ability
0.26
transfer
0.25
bases
0.24
_base
0.24
acquisition
0.23
Activations Density 0.029%