INDEX
Explanations
significant concepts related to personal or moral integrity
New Auto-Interp
Negative Logits
YC
-0.19
ogan
-0.19
enis
-0.16
elier
-0.15
vince
-0.15
Linh
-0.14
δή
-0.14
coles
-0.14
meni
-0.14
usch
-0.14
POSITIVE LOGITS
either
0.27
EITHER
0.24
Either
0.21
either
0.20
Either
0.18
somewhere
0.15
ãĥ³ãĤº
0.15
ek
0.15
ither
0.14
ortho
0.14
Activations Density 0.003%