INDEX
Explanations
phrases related to requests or offers for help
New Auto-Interp
Negative Logits
".
-0.86
).}
-0.85
).”
-0.85
).
-0.84
).'
-0.81
)."
-0.80
[…]
-0.78
[…]
-0.77
).</
-0.75
'])->
-0.75
POSITIVE LOGITS
inderdaad
0.86
thread
0.83
OP
0.81
<bos>
0.72
...@
0.71
FTFY
0.69
↵↵↵
0.68
indeed
0.66
downvoted
0.66
@
0.66
Activations Density 0.732%