INDEX
Explanations
opinions or speculative statements
expressions of opinions or assertions
New Auto-Interp
Negative Logits
Notable
-0.71
sing
-0.67
ership
-0.65
Merit
-0.64
comed
-0.64
cele
-0.63
Frames
-0.63
Afterwards
-0.62
]).
-0.61
knit
-0.61
POSITIVE LOGITS
answer
1.56
Answer
1.46
Answer
1.42
answers
1.29
swers
1.23
answered
1.09
answ
0.97
answer
0.95
reply
0.94
answering
0.89
Activations Density 0.532%