INDEX
Explanations
questions in a text
references to inquiries and questioning
New Auto-Interp
Negative Logits
ufact
-0.97
axy
-0.79
yss
-0.76
minist
-0.75
oreal
-0.74
Tycoon
-0.73
rites
-0.72
ensions
-0.69
alty
-0.68
rylic
-0.68
POSITIVE LOGITS
naires
1.40
naire
1.12
answered
1.08
posed
1.03
questions
0.96
Questions
0.85
unanswered
0.84
probing
0.82
pertaining
0.81
asked
0.80
Activations Density 0.033%