INDEX
Explanations
numbers that are answers to questions
instances of the word "answer."
New Auto-Interp
Negative Logits
ovie
-0.76
wana
-0.76
ammy
-0.71
zinski
-0.70
erker
-0.69
sights
-0.68
ker
-0.67
compuls
-0.67
Nanto
-0.67
enic
-0.66
POSITIVE LOGITS
answer
1.25
answer
1.19
ysis
1.15
swers
1.15
answ
1.07
Answer
1.03
answering
1.00
answers
0.98
answered
0.96
Answer
0.96
Activations Density 0.017%