INDEX
Explanations
instances of discussions and dialogues
New Auto-Interp
Negative Logits
ossip
-0.17
lem
-0.15
lemn
-0.15
569
-0.14
communicate
-0.14
/DD
-0.14
carp
-0.14
hlas
-0.14
unk
-0.14
ery
-0.14
POSITIVE LOGITS
starter
0.27
about
0.24
starters
0.24
starter
0.24
Starter
0.23
-about
0.20
_about
0.20
About
0.19
ative
0.19
about
0.18
Activations Density 0.053%