INDEX
Explanations
elements related to relationships and human connections
New Auto-Interp
Negative Logits
atest
-0.15
/misc
-0.15
-found
-0.15
CADE
-0.14
ationale
-0.14
ungen
-0.14
ivet
-0.14
818
-0.13
ossible
-0.13
Cue
-0.13
POSITIVE LOGITS
both
0.20
indeed
0.17
both
0.17
even
0.17
både
0.16
actually
0.16
ë²Į
0.15
nejen
0.15
Both
0.15
actual
0.14
Activations Density 0.013%