INDEX
Explanations
answers or responses to questions
affirmations or definitive answers to questions posed within the text
New Auto-Interp
Negative Logits
¥µ
-0.76
ivities
-0.62
Orient
-0.61
apons
-0.60
idi
-0.60
hobbies
-0.60
ombat
-0.59
iannopoulos
-0.58
spir
-0.58
activities
-0.57
POSITIVE LOGITS
YES
1.13
yes
1.08
YES
1.01
yes
0.94
affirmative
0.86
Nope
0.84
Yes
0.80
nil
0.78
answer
0.76
brainer
0.74
Activations Density 0.148%