INDEX
Explanations
references to people's backgrounds or experiences
New Auto-Interp
Negative Logits
apa
-0.16
ardi
-0.16
oton
-0.15
iro
-0.15
ew
-0.15
isk
-0.15
ib
-0.14
ää
-0.14
Discovery
-0.14
baz
-0.14
POSITIVE LOGITS
/background
0.18
bench
0.16
reten
0.16
educt
0.15
backgrounds
0.15
Enumerator
0.15
background
0.15
èĥĮæĻ¯
0.14
Background
0.14
lad
0.14
Activations Density 0.009%