INDEX
Explanations
connections between contrasting ideas or concepts
New Auto-Interp
Negative Logits
not
-2.06
não
-1.73
nicht
-1.71
not
-1.63
niet
-1.63
tidak
-1.59
ikke
-1.56
neither
-1.50
không
-1.48
NOT
-1.47
POSITIVE LOGITS
而是
0.79
vielmehr
0.77
rungsseite
0.75
بلکه
0.68
autorytatywna
0.64
sondern
0.63
downright
0.58
pyplot
0.58
مرئيه
0.57
むしろ
0.56
Activations Density 0.894%