INDEX
Explanations
terms that indicate self-reflection and introspection
New Auto-Interp
Negative Logits
يتيمه
-0.72
ejus
-0.72
whor
-0.70
coisa
-0.70
робнее
-0.69
Dumas
-0.69
paravant
-0.69
covariance
-0.68
оригіналу
-0.67
brancas
-0.67
POSITIVE LOGITS
reflected
2.28
reflecting
2.26
reflection
2.25
reflect
2.24
reflections
2.16
Reflect
2.16
reflects
2.16
reflect
2.05
refle
1.97
Reflection
1.92
Activations Density 0.071%