INDEX
Explanations
phrases related to question and answer formats
New Auto-Interp
Negative Logits
ufact
-0.75
raint
-0.68
etheless
-0.60
fell
-0.57
ipedia
-0.57
wolf
-0.56
Goldman
-0.56
fitted
-0.56
fitting
-0.54
neb
-0.53
POSITIVE LOGITS
FAQ
0.91
Answer
0.90
Q
0.88
Reply
0.85
Ds
0.79
Cs
0.78
answered
0.78
answer
0.78
Answer
0.76
LD
0.75
Activations Density 0.022%