INDEX
Explanations
questions or statements with uncertain or subjective implications
New Auto-Interp
Negative Logits
joining
-0.71
Notable
-0.69
spir
-0.65
gamma
-0.65
padding
-0.61
anim
-0.59
owship
-0.59
cele
-0.59
mark
-0.58
knit
-0.58
POSITIVE LOGITS
Answer
1.71
Answer
1.51
answer
1.42
swers
1.27
answers
1.25
answered
1.20
answer
1.10
reply
0.97
Nope
0.96
answ
0.94
Activations Density 0.314%