INDEX
Explanations
phrases indicating uncertainty or indecision
phrases expressing confusion or uncertainty about actions or decisions
New Auto-Interp
Negative Logits
urance
-0.62
members
-0.61
quad
-0.60
Confeder
-0.58
Vox
-0.57
artisan
-0.57
ĸļ
-0.57
vouchers
-0.57
Nurs
-0.57
creator
-0.56
POSITIVE LOGITS
eat
1.01
choose
0.95
give
0.94
take
0.94
characterize
0.93
interpret
0.92
classify
0.92
blame
0.92
inflict
0.92
expect
0.90
Activations Density 0.066%