INDEX
Explanations
concepts pertaining to truth and dishonesty
New Auto-Interp
Negative Logits
jstor
-0.70
addPreferredGap
-0.68
ⓧ
-0.67
╚
-0.63
وتسجيلات
-0.63
Архівовано
-0.61
Tembelea
-0.61
naires
-0.60
canActivate
-0.60
:+:
-0.59
POSITIVE LOGITS
truth
3.88
Truth
3.44
truth
3.44
Truth
3.25
TRUTH
3.14
truths
2.72
Wahrheit
2.40
Truths
2.39
verità
2.11
vérité
2.04
Activations Density 0.039%