INDEX
Explanations
references to 'truth' and related concepts of honesty and transparency
New Auto-Interp
Negative Logits
Hij
-0.15
ICLE
-0.15
iac
-0.14
equal
-0.14
haus
-0.14
hire
-0.14
zet
-0.14
à¹Ħว
-0.14
herited
-0.13
hof
-0.13
POSITIVE LOGITS
fulness
0.35
fully
0.34
iness
0.28
FUL
0.22
serum
0.22
FULL
0.21
Serum
0.21
full
0.20
worthy
0.19
ilde
0.18
Activations Density 0.015%