INDEX
Explanations
questions and requests for feedback or input
conditional phrases and questions related to user opinions or responses
New Auto-Interp
Negative Logits
migr
-0.68
gamma
-0.67
nightly
-0.67
spir
-0.66
nons
-0.61
carriers
-0.59
glyph
-0.58
£ı
-0.58
footh
-0.58
stret
-0.57
POSITIVE LOGITS
Answer
1.37
Answer
1.07
Nope
1.04
?????-?????-
0.98
swers
0.90
answered
0.80
.?
0.79
answer
0.79
?????-
0.79
answer
0.77
Activations Density 0.275%