INDEX
Explanations
based on financial or technical details
tokens that occur in the assistant's long, explanatory reply text — especially opening/discourse tokens (like "Okay,") and other words in extended model-generated responses.
New Auto-Interp
Negative Logits
myButtons
0.52
ड़ने
0.52
ಂಟು
0.49
придется
0.49
bisog
0.49
кость
0.48
quaisquer
0.48
אים
0.47
Polaribacter
0.47
ान्य
0.47
POSITIVE LOGITS
github
0.53
(
0.51
Cl
0.49
ref
0.47
seed
0.47
static
0.46
water
0.46
line
0.46
ll
0.45
style
0.45
Activations Density 0.001%