INDEX
Explanations
phrases that begin with the word "So."
New Auto-Interp
Negative Logits
duk
-0.15
dabei
-0.15
icaret
-0.15
ponse
-0.15
idis
-0.15
IONS
-0.14
so
-0.14
lon
-0.14
erse
-0.14
nty
-0.14
POSITIVE LOGITS
oner
0.26
-called
0.24
ftware
0.20
aked
0.20
aring
0.19
although
0.19
instead
0.19
far
0.19
apy
0.18
fter
0.18
Activations Density 0.045%