INDEX
Explanations
words that indicate a strong impact or significant change
New Auto-Interp
Negative Logits
aspers
-0.15
/GPL
-0.15
Baxter
-0.15
enger
-0.15
ãĥĩãĥ«
-0.14
absol
-0.14
bach
-0.14
pps
-0.14
bos
-0.14
.tp
-0.14
POSITIVE LOGITS
æıIJé«ĺ
0.18
æĿ
0.18
different
0.18
improve
0.17
improves
0.17
atta
0.16
oup
0.16
improved
0.16
Improve
0.16
oupon
0.16
Activations Density 0.047%