INDEX
Explanations
phrases related to making modifications or adjustments
New Auto-Interp
Negative Logits
aha
-0.16
otlin
-0.16
amina
-0.15
uma
-0.15
obia
-0.14
iele
-0.14
Copyright
-0.14
redients
-0.14
ugins
-0.14
ambre
-0.14
POSITIVE LOGITS
slightly
0.93
somewhat
0.73
slight
0.63
немного
0.59
biraz
0.56
bit
0.54
trochu
0.51
chút
0.49
ÑĤÑĢоÑħи
0.49
ç¨į
0.48
Activations Density 0.684%