INDEX
Explanations
expressions related to truth-telling and the pursuit of knowledge
New Auto-Interp
Negative Logits
adic
-0.17
ocop
-0.15
illard
-0.15
Stam
-0.14
oogle
-0.14
å½
-0.14
neider
-0.13
221
-0.13
nul
-0.13
ocab
-0.13
POSITIVE LOGITS
truth
1.05
truth
0.93
Truth
0.88
Truth
0.82
truths
0.77
verdad
0.71
_truth
0.65
truthful
0.60
.truth
0.55
true
0.46
Activations Density 0.197%