INDEX
Explanations
instances of affirmation or agreement in dialogue
New Auto-Interp
Negative Logits
aine
-0.16
uben
-0.16
obili
-0.15
766
-0.15
\Module
-0.14
ixel
-0.14
uzzi
-0.14
Host
-0.14
zas
-0.14
uct
-0.14
POSITIVE LOGITS
Lever
0.15
Lambert
0.15
bridge
0.15
arc
0.15
Spectrum
0.14
Spe
0.14
_TUN
0.14
implicitly
0.14
cliff
0.14
ãĤ·ãĥªãĥ¼ãĤº
0.13
Activations Density 0.023%