INDEX
Explanations
questions with 'what' as the first word
questions that seek clarification or further information
New Auto-Interp
Negative Logits
aku
-0.78
ishable
-0.76
coil
-0.71
padded
-0.69
rek
-0.69
onde
-0.68
aper
-0.68
binge
-0.66
kee
-0.65
worm
-0.65
POSITIVE LOGITS
Well
1.02
Surely
0.99
Probably
0.97
Nope
0.96
?:
0.95
.?
0.93
����
0.90
Why
0.90
Turns
0.89
Perhaps
0.87
Activations Density 0.100%