INDEX
Explanations
the word "surrogate" or variations of it
references to surrogates in various contexts
New Auto-Interp
Negative Logits
inness
-0.91
INESS
-0.84
ulhu
-0.75
azes
-0.73
alach
-0.72
OA
-0.70
oak
-0.70
IST
-0.70
endon
-0.69
ovie
-0.69
POSITIVE LOGITS
surrog
1.58
surrogate
1.37
Parenthood
0.73
gest
0.73
riages
0.70
laun
0.69
ãĤ§
0.68
encour
0.67
airst
0.65
lod
0.64
Activations Density 0.007%