INDEX
Explanations
conversational transitions and expressions of digression
New Auto-Interp
Negative Logits
AndEndTag
-0.90
+#+#
-0.87
Pourtant
-0.79
виправивши
-0.74
auroit
-0.73
hés
-0.72
EndInit
-0.71
StructEnd
-0.71
Monfieur
-0.70
Abonnez
-0.69
POSITIVE LOGITS
digress
0.81
Anyway
0.61
details
0.60
Briefly
0.55
Back
0.52
Details
0.51
Anyway
0.51
details
0.50
Anyways
0.48
Back
0.46
Activations Density 0.178%