INDEX
Explanations
phrases or sentences that discuss emotional or psychological struggles
New Auto-Interp
Negative Logits
yet
-0.17
orthy
-0.15
ucas
-0.15
isti
-0.15
illes
-0.14
yet
-0.14
ustin
-0.14
Yet
-0.14
lei
-0.13
ilk
-0.13
POSITIVE LOGITS
though
0.18
although
0.18
although
0.17
though
0.17
oldur
0.17
because
0.16
Though
0.15
because
0.15
dit
0.15
dun
0.14
Activations Density 0.308%