INDEX
Explanations
terms related to evaluations of value and reality within philosophical or psychological contexts
New Auto-Interp
Negative Logits
prom
-0.15
Flames
-0.14
rec
-0.14
ightly
-0.14
ellas
-0.14
Gore
-0.14
Victorian
-0.13
(',',$-0.13
.lab
-0.13
intr
-0.13
POSITIVE LOGITS
upo
0.17
̧
0.16
ãĥ¼ãĥŃ
0.16
باز
0.15
ucs
0.15
ê³
0.15
aks
0.14
Ashton
0.14
Ùĩر
0.14
olm
0.14
Activations Density 0.004%