INDEX
Explanations
phrases indicating assistance or support
New Auto-Interp
Negative Logits
914
-0.17
.persistent
-0.17
pery
-0.15
asal
-0.15
cka
-0.14
enia
-0.14
ëĪĦ
-0.14
tha
-0.14
еÑĩно
-0.14
adem
-0.13
POSITIVE LOGITS
ease
0.42
ease
0.36
abandon
0.35
Ease
0.34
precision
0.31
Ease
0.29
gusto
0.29
apl
0.27
efficiency
0.26
vigor
0.25
Activations Density 0.167%