INDEX
Explanations
dialogue-related phrases and discussions
references to conversation or discussion in various forms
New Auto-Interp
Negative Logits
cot
-0.76
rule
-0.74
addons
-0.74
isher
-0.73
ulz
-0.72
arah
-0.71
cheat
-0.70
rug
-0.69
old
-0.69
innon
-0.69
POSITIVE LOGITS
dialogue
1.04
naire
0.99
ogue
0.87
Franç
0.84
ues
0.84
Dialogue
0.83
conversation
0.77
dialog
0.77
reperto
0.77
dayName
0.75
Activations Density 0.018%