INDEX
Explanations
statements that assert or define the nature of love
New Auto-Interp
Negative Logits
ĥ½
-0.16
antry
-0.15
allet
-0.14
-eslint
-0.14
931
-0.13
vitam
-0.13
ıklı
-0.13
ercial
-0.13
tone
-0.13
930
-0.13
POSITIVE LOGITS
åĴ²
0.16
orio
0.15
reta
0.15
arov
0.14
wholes
0.14
داÙĨÙĦÙĪØ¯
0.14
ĭ
0.14
berger
0.13
izar
0.13
ovol
0.13
Activations Density 0.406%