INDEX
Explanations
questions posed around a variety of topics
the word "whether" and associated questions or considerations
New Auto-Interp
Negative Logits
aging
-0.83
viron
-0.73
izing
-0.71
ija
-0.71
ages
-0.71
thal
-0.70
Eye
-0.69
ursed
-0.67
bing
-0.67
ID
-0.66
POSITIVE LOGITS
soever
1.35
yip
0.88
whether
0.80
whether
0.80
ornia
0.72
aminer
0.70
nodd
0.69
warr
0.67
reluct
0.67
terday
0.65
Activations Density 0.032%