INDEX
Explanations
mentions of the name "John" in various contexts
New Auto-Interp
Negative Logits
/testing
-0.16
Leakage
-0.15
zens
-0.15
Leak
-0.14
CTest
-0.14
//{{-0.14
KNOWN
-0.14
ator
-0.14
hail
-0.14
instein
-0.13
POSITIVE LOGITS
features
0.18
cov
0.15
uo
0.14
iry
0.14
feature
0.14
fore
0.14
PEC
0.14
CHE
0.14
.ma
0.14
_relative
0.14
Activations Density 0.023%