INDEX
Explanations
phrases related to scientific explanations and reasoning
New Auto-Interp
Negative Logits
verbatim
-0.16
nga
-0.16
hled
-0.13
?>"/>↵
-0.13
keh
-0.13
á»Ļc
-0.13
eling
-0.13
oplan
-0.13
arga
-0.12
voj
-0.12
POSITIVE LOGITS
explanation
1.13
explain
1.12
explanations
1.07
explaining
1.05
explains
1.02
explained
1.00
Explanation
0.97
Explain
0.92
explain
0.91
Explanation
0.91
Activations Density 0.288%