INDEX
Explanations
the word "explain" and possibly words related to scientific explanations or theories.
explanations or theories
phrases related to providing explanations
New Auto-Interp
Negative Logits
engeance
-0.83
nown
-0.79
illet
-0.77
inal
-0.76
jet
-0.76
thritis
-0.74
nir
-0.72
naissance
-0.72
ascus
-0.71
atri
-0.70
POSITIVE LOGITS
why
1.15
WHY
1.10
explanations
0.98
ĸļ
0.93
explan
0.93
explain
0.93
Explain
0.92
why
0.92
explains
0.87
urated
0.86
Activations Density 0.020%