INDEX
Explanations
self-reflective and introspective statements
instances of self-reflection and personal responsibility statements
New Auto-Interp
Negative Logits
代
-0.72
PI
-0.69
éĹĺ
-0.67
dayName
-0.65
oided
-0.65
kat
-0.64
arettes
-0.64
ãĤµ
-0.64
tnc
-0.64
911
-0.63
POSITIVE LOGITS
nonetheless
0.99
nevertheless
0.83
etheless
0.83
alas
0.81
persisted
0.80
persists
0.78
curiously
0.77
tons
0.76
chers
0.76
importantly
0.74
Activations Density 0.262%