INDEX
Explanations
references to academic studies and research findings
New Auto-Interp
Negative Logits
è¡
-0.15
elucid
-0.15
503
-0.14
çIJĨè§£
-0.14
ìķĪ
-0.14
wit
-0.13
κÏĮ
-0.13
uben
-0.13
justify
-0.13
agnostics
-0.13
POSITIVE LOGITS
found
0.35
finds
0.29
found
0.29
find
0.28
finding
0.28
findings
0.26
looked
0.26
FOUND
0.26
find
0.26
.find
0.24
Activations Density 0.079%