INDEX
Explanations
phrases indicating a conversational or relational context
New Auto-Interp
Negative Logits
newPos
-0.15
newX
-0.14
avec
-0.14
Abram
-0.14
neutral
-0.14
iques
-0.14
usalem
-0.14
Fog
-0.14
LLP
-0.13
ê´Ģ
-0.13
POSITIVE LOGITS
note
0.15
ductive
0.15
ften
0.15
asta
0.14
OLON
0.14
UED
0.14
EMALE
0.14
ensis
0.14
reff
0.14
CDF
0.14
Activations Density 0.018%