INDEX
Explanations
references to unusual or unconventional characteristics
New Auto-Interp
Negative Logits
arian
-0.17
apor
-0.16
è±Ĭ
-0.15
tee
-0.15
owitz
-0.15
toi
-0.14
azer
-0.14
raphics
-0.14
Mori
-0.14
Kra
-0.14
POSITIVE LOGITS
ball
0.40
ities
0.32
yssey
0.31
balls
0.31
-ball
0.27
ity
0.26
-number
0.24
ments
0.23
/e
0.23
Ball
0.22
Activations Density 0.010%