INDEX
Explanations
references to transitions or connections between ideas
New Auto-Interp
Negative Logits
cats
-0.64
ridge
-0.63
intimidated
-0.62
ãĤ¼ãĤ¦ãĤ¹
-0.60
ourn
-0.59
alse
-0.59
taste
-0.59
ellig
-0.58
endors
-0.58
aughs
-0.56
POSITIVE LOGITS
Conclusion
0.85
WHY
0.84
why
0.82
â̦)
0.81
â̦]
0.80
QUEST
0.79
why
0.79
question
0.75
...)
0.75
cue
0.71
Activations Density 0.481%