INDEX
Explanations
acknowledging a point before contrasting
New Auto-Interp
Negative Logits
不仅
0.57
nejen
0.57
либо
0.56
either
0.54
Either
0.49
настолько
0.45
simplesmente
0.44
либо
0.44
entweder
0.43
不僅
0.42
POSITIVE LOGITS
ostensibly
0.94
nominally
0.93
superficially
0.86
technically
0.82
certainly
0.82
Certainly
0.79
undoubtedly
0.73
確かに
0.73
admittedly
0.72
certamente
0.71
Activations Density 0.097%