INDEX
Explanations
questions
questions relating to personal experiences and opinions
New Auto-Interp
Negative Logits
cipled
-0.73
lap
-0.71
migr
-0.67
fanc
-0.64
swamp
-0.64
anyahu
-0.62
embro
-0.61
fut
-0.60
ufact
-0.60
lifes
-0.60
POSITIVE LOGITS
Answer
1.06
Obviously
1.05
Would
1.04
Explain
1.01
Answer
0.99
Seems
0.98
.?
0.97
Certainly
0.96
Were
0.93
Probably
0.92
Activations Density 0.080%