INDEX
Explanations
concepts related to philosophical discussions about power and transformation
New Auto-Interp
Negative Logits
lg
-0.16
wers
-0.16
awai
-0.15
sep
-0.15
enberg
-0.15
upy
-0.14
irit
-0.14
ando
-0.14
hausen
-0.14
anken
-0.13
POSITIVE LOGITS
itself
0.63
unto
0.58
alone
0.40
themselves
0.39
Alone
0.29
alone
0.29
herself
0.28
Ñģами
0.27
Ñģама
0.27
induction
0.25
Activations Density 0.080%