INDEX
Explanations
introduces qualifications or emphasis
New Auto-Interp
Negative Logits
했는데
0.85
ಿದ್ದು
0.84
인데
0.82
waardoor
0.80
ทำให้
0.78
zodat
0.76
sehingga
0.74
있는데
0.73
。
0.71
waarin
0.71
POSITIVE LOGITS
paradox
1.37
albeit
1.34
conversely
1.29
ironically
1.29
despite
1.21
unsurprisingly
1.20
surprisingly
1.20
admittedly
1.17
albeit
1.15
contrary
1.12
Activations Density 0.300%