INDEX
Explanations
phrases related to questions and their corresponding answers
New Auto-Interp
Negative Logits
igi
-0.17
------+------+
-0.15
Ì£
-0.15
enez
-0.15
akis
-0.14
quez
-0.14
Ñijм
-0.14
ernal
-0.14
wolf
-0.14
ewis
-0.14
POSITIVE LOGITS
phone
0.16
nable
0.15
truth
0.15
idual
0.15
stell
0.15
affen
0.15
/address
0.14
IFS
0.14
aries
0.14
hip
0.14
Activations Density 0.051%