INDEX
Explanations
expressions of affection and appreciation
New Auto-Interp
Negative Logits
zelf
-0.14
irc
-0.13
erra
-0.13
another
-0.13
appen
-0.13
href
-0.13
anton
-0.13
-même
-0.13
soon
-0.13
roi
-0.13
POSITIVE LOGITS
how
0.29
hearing
0.28
seeing
0.23
everything
0.23
nothing
0.22
eeee
0.22
anything
0.20
ee
0.20
eee
0.20
-lo
0.19
Activations Density 0.089%