INDEX
Explanations
phrases indicating purpose or usefulness
New Auto-Interp
Negative Logits
ldb
-0.14
mor
-0.14
ram
-0.14
\grid
-0.14
Felix
-0.14
rash
-0.13
rips
-0.13
Weights
-0.13
onomy
-0.13
mun
-0.13
POSITIVE LOGITS
purposes
0.20
yna
0.18
ÏĦÏģο
0.16
dust
0.15
κοÏĤ
0.15
reasons
0.15
amina
0.15
usement
0.15
upe
0.15
ays
0.14
Activations Density 0.352%