INDEX
Explanations
various nouns and concepts related to personal history and relationships
New Auto-Interp
Negative Logits
onna
-0.15
ept
-0.15
BOVE
-0.15
alo
-0.15
Wah
-0.14
enta
-0.14
uz
-0.14
amped
-0.14
tape
-0.14
leness
-0.14
POSITIVE LOGITS
æĹ§
0.20
-old
0.16
old
0.15
(old
0.15
/Peak
0.14
Äiju
0.14
old
0.14
/new
0.14
yny
0.14
-fashioned
0.14
Activations Density 0.111%