INDEX
Explanations
questions
questions in the text
New Auto-Interp
Negative Logits
corrid
-0.76
swamp
-0.67
lock
-0.66
inclusive
-0.65
portal
-0.65
ikuman
-0.64
agne
-0.64
éĹĺ
-0.64
der
-0.64
booked
-0.63
POSITIVE LOGITS
Nope
1.37
Absolutely
1.31
Probably
1.29
Certainly
1.29
Possibly
1.28
Maybe
1.14
Perhaps
1.13
Does
1.13
Yes
1.13
Surely
1.12
Activations Density 0.090%