INDEX
Explanations
phrases mentioning the concept of truth
occurrences of the word "Truth" and related concepts
New Auto-Interp
Negative Logits
uled
-0.75
avy
-0.71
wana
-0.68
joining
-0.65
capacity
-0.63
ATA
-0.63
urations
-0.62
arnaev
-0.62
rotein
-0.61
alian
-0.61
POSITIVE LOGITS
fulness
1.26
fully
1.19
iness
0.91
telling
0.89
ously
0.85
lyn
0.84
ful
0.83
ulent
0.82
orial
0.81
lessly
0.80
Activations Density 0.020%