INDEX
Explanations
phrases indicating advice or recommendations
seeking or offering tips
New Auto-Interp
Negative Logits
<bos>
-0.59
a
-0.44
A
-0.39
↵↵
-0.38
A
-0.37
objects
-0.35
und
-0.35
ander
-0.35
rent
-0.35
an
-0.35
POSITIVE LOGITS
tips
2.06
Tips
1.89
Tips
1.72
tips
1.70
TIPS
1.59
Tipps
1.55
TIPS
1.45
Tipps
1.24
tipps
1.15
dicas
1.11
Activations Density 0.003%