INDEX
Explanations
terms related to communication and organization
New Auto-Interp
Negative Logits
ampo
-0.20
arness
-0.16
$route
-0.15
ennent
-0.15
vit
-0.14
acket
-0.14
ibling
-0.14
_construct
-0.14
Ear
-0.14
AILY
-0.14
POSITIVE LOGITS
means
0.24
Means
0.20
means
0.20
Means
0.20
sources
0.18
Sources
0.17
Mode
0.17
forms
0.17
-mode
0.17
eki
0.17
Activations Density 0.028%