INDEX
Explanations
terms related to truth and factual information
references to the concept of "truth."
New Auto-Interp
Negative Logits
senal
-0.70
emetery
-0.69
uled
-0.69
joining
-0.68
ategory
-0.67
rotein
-0.65
ESH
-0.65
avy
-0.65
hyde
-0.65
igree
-0.64
POSITIVE LOGITS
fully
1.15
fulness
1.06
telling
0.92
dig
0.90
ful
0.87
lyn
0.85
\\\\\\\\
0.85
iness
0.83
seeker
0.82
deal
0.78
Activations Density 0.023%