INDEX
Explanations
themes of hypocrisy and disconnection between stated beliefs and actions
New Auto-Interp
Negative Logits
ancia
-0.19
CEEDED
-0.15
ancias
-0.15
ë»
-0.14
OLEAN
-0.14
ä¼Ļ
-0.14
æŀĿ
-0.14
ilian
-0.14
à¹Ĥà¸Ĺ
-0.13
almÄ±ÅŁ
-0.13
POSITIVE LOGITS
real
0.23
actual
0.23
any
0.21
actually
0.20
anything
0.19
actual
0.19
genuine
0.19
basic
0.18
REAL
0.17
acknowledge
0.17
Activations Density 0.383%