INDEX
Explanations
comparative phrases and contrasts between entities or situations
New Auto-Interp
Negative Logits
idon
-0.17
alach
-0.15
regunta
-0.14
ÙĪÛĮÙĦ
-0.14
allen
-0.14
²
-0.14
à¹Īà¸Ńà¸Ļ
-0.14
alat
-0.14
mez
-0.13
flen
-0.13
POSITIVE LOGITS
же
0.15
438
0.14
lish
0.14
olly
0.14
rát
0.14
-redux
0.14
ạo
0.13
433
0.13
iles
0.13
824
0.13
Activations Density 0.175%