INDEX
Explanations
adverbs that indicate effective or proper performance in actions
New Auto-Interp
Negative Logits
まった
-0.59
coper
-0.55
substitution
-0.54
Substitution
-0.54
inspira
-0.52
gekomen
-0.52
animés
-0.51
perbaikan
-0.51
grö
-0.51
as
-0.51
POSITIVE LOGITS
)";
0.85
correctly
0.83
cibly
0.83
denly
0.82
]));
0.81
edly
0.80
safely
0.79
]),
0.79
ligently
0.78
oughly
0.77
Activations Density 0.381%