INDEX
Explanations
phrases indicating possession or presence of specific items or inquiries
New Auto-Interp
Negative Logits
amburger
-0.15
idente
-0.14
Probably
-0.14
istr
-0.14
izo
-0.14
رخ
-0.13
/ubuntu
-0.13
ادÙĨ
-0.13
ãĥ¼ãĥĵ
-0.13
most
-0.13
POSITIVE LOGITS
questions
0.34
any
0.33
ever
0.28
Questions
0.27
questions
0.26
ANY
0.24
Any
0.22
Questions
0.22
question
0.22
-any
0.21
Activations Density 0.104%