INDEX
Explanations
punctuation followed by conjunctions
New Auto-Interp
Negative Logits
without
0.51
tanpa
0.47
zonder
0.43
uden
0.43
WITHOUT
0.42
無需
0.42
毫不
0.40
ohne
0.39
без
0.39
بغیر
0.39
POSITIVE LOGITS
而是
0.94
بلکه
0.79
बल्कि
0.79
nor
0.74
sondern
0.72
بلکہ
0.71
Instead
0.71
বরং
0.70
Instead
0.68
nor
0.66
Activations Density 0.452%