INDEX
Explanations
references to biases and misconceptions in perception and evaluation
New Auto-Interp
Negative Logits
shouldBe
-0.15
erif
-0.15
andal
-0.14
oriously
-0.13
(always
-0.13
ifdef
-0.13
enkins
-0.13
_typeof
-0.12
youre
-0.12
etat
-0.12
POSITIVE LOGITS
doesn
0.71
nicht
0.60
didn
0.60
tidak
0.60
isn
0.59
não
0.56
neither
0.56
wasn
0.56
does
0.55
niet
0.53
Activations Density 1.812%