INDEX
Explanations
statements indicating truth or honesty
references to the concept of truth
New Auto-Interp
Negative Logits
eds
-0.73
anus
-0.71
uled
-0.70
cca
-0.70
hire
-0.68
eda
-0.67
akuya
-0.67
wana
-0.67
fever
-0.67
oso
-0.64
POSITIVE LOGITS
fulness
0.97
UTH
0.80
displayText
0.74
psons
0.74
extent
0.74
thereof
0.70
Dome
0.69
liest
0.67
truth
0.65
srfAttach
0.65
Activations Density 0.055%