INDEX
Explanations
noticing unusual or surprising things
New Auto-Interp
Negative Logits
unfortunately
0.62
malheureusement
0.61
Unfortunately
0.57
heureusement
0.54
Unfortunately
0.54
unfortunately
0.53
aufgrund
0.50
fortunately
0.49
Sadly
0.49
दुर्भाग्य
0.48
POSITIVE LOGITS
明明
0.73
seem
0.70
seemingly
0.67
seem
0.66
none
0.64
почти
0.64
semblent
0.63
Neither
0.63
despite
0.61
好像
0.61
Activations Density 0.006%