INDEX
Explanations
phrases indicating reasoning or explanation
phrases that indicate reasons or justifications
New Auto-Interp
Negative Logits
yss
-0.85
ILCS
-0.79
chn
-0.77
bats
-0.75
OLOGY
-0.73
chin
-0.73
fman
-0.71
ps
-0.70
hess
-0.70
abytes
-0.70
POSITIVE LOGITS
exist
0.78
inaction
0.78
doubt
0.75
hesitation
0.75
discrepancy
0.75
dispute
0.74
why
0.74
existence
0.72
justify
0.72
skepticism
0.72
Activations Density 0.099%