INDEX
Explanations
acknowledgment words after comma
New Auto-Interp
Negative Logits
却
0.46
However
0.43
Alternatively
0.42
Moreover
0.41
Although
0.39
否
0.38
Includes
0.38
较低
0.38
卻
0.38
0.38
POSITIVE LOGITS
alright
0.82
Alright
0.75
buckle
0.74
sounds
0.70
okay
0.66
那我們
0.63
glad
0.63
Alright
0.62
allons
0.61
definitely
0.61
Activations Density 0.258%