INDEX
Explanations
references to truth and its nuances in various contexts
New Auto-Interp
Negative Logits
/***/
-0.97
subsection
-0.92
Fores
-0.88
Huskies
-0.86
}),
-0.86
--)
-0.84
liesslich
-0.82
nakalista
-0.81
GTCX
-0.80
']))
-0.80
POSITIVE LOGITS
Truth
0.98
Truth
0.98
truth
0.92
truth
0.89
TRUTH
0.87
Meaning
0.78
Morrison
0.73
fulness
0.73
verità
0.72
Meaning
0.72
Activations Density 0.121%