INDEX
Explanations
questions or prompts indicated by a question mark at the end of a sentence
questions and inquiries
New Auto-Interp
Negative Logits
cipled
-0.72
fanc
-0.72
fut
-0.70
tables
-0.66
migr
-0.65
lap
-0.65
anyahu
-0.63
transgress
-0.63
bidden
-0.63
opath
-0.63
POSITIVE LOGITS
Seems
1.15
Would
1.14
Obviously
1.13
Explain
1.11
Answer
1.10
Personally
1.06
Especially
1.01
Does
0.99
Were
0.99
Did
0.98
Activations Density 0.083%