INDEX
Explanations
questions in text
references to questions and answers
New Auto-Interp
Negative Logits
ufact
-0.78
orpor
-0.77
ected
-0.76
rites
-0.75
agically
-0.71
oiler
-0.66
axy
-0.60
Tycoon
-0.59
ont
-0.59
emulate
-0.58
POSITIVE LOGITS
naires
1.59
naire
1.44
answered
1.24
answ
1.13
Answer
1.11
answer
1.08
asked
1.07
answered
1.07
posed
1.07
questions
0.97
Activations Density 0.064%