INDEX
Explanations
terms related to philosophical concepts
New Auto-Interp
Negative Logits
e
-0.41
t
-0.32
eck
-0.30
i
-0.29
eh
-0.28
eut
-0.28
eel
-0.28
ebo
-0.28
s
-0.27
eam
-0.27
POSITIVE LOGITS
aurus
0.35
copy
0.34
hiba
0.32
patial
0.29
keleton
0.29
otros
0.26
ynthesis
0.25
ystem
0.25
ystems
0.24
ocial
0.24
Activations Density 0.027%