INDEX
Explanations
references to personal backgrounds or experiences
New Auto-Interp
Negative Logits
ardi
-0.14
[
-0.14
ew
-0.14
ibel
-0.14
thorough
-0.14
baz
-0.14
ib
-0.14
az
-0.14
abeth
-0.14
pt
-0.13
POSITIVE LOGITS
/background
0.18
educt
0.16
asser
0.16
lad
0.16
ijkstra
0.15
475
0.14
filmer
0.14
åĩºçīĪ社
0.14
ÙĥÙĬÙĬÙģ
0.14
kus
0.14
Activations Density 0.013%