INDEX
Explanations
questions with a specific formatting, likely indicating structured interviews or conversations
questions and inquiries involving 'Q' designations
New Auto-Interp
Negative Logits
-0.75
endanger
-0.72
olesc
-0.68
degener
-0.67
flock
-0.66
continu
-0.66
remnant
-0.63
flood
-0.62
purse
-0.62
flare
-0.62
POSITIVE LOGITS
Explain
1.30
Speaking
1.15
How
1.14
Were
1.14
Congratulations
1.13
Interesting
1.13
How
1.13
What
1.13
Tell
1.13
Alright
1.13
Activations Density 0.096%