INDEX
Explanations
phrases containing the concept of "truth."
New Auto-Interp
Negative Logits
-gnu
-0.16
æij
-0.15
unas
-0.15
ÙĪØ±Ø¯
-0.14
niž
-0.14
ÏĢιÏĥ
-0.14
бÑĥ
-0.14
.localized
-0.14
apt
-0.14
ftware
-0.14
POSITIVE LOGITS
auen
0.16
he
0.16
gle
0.15
utsch
0.15
axis
0.15
oss
0.15
RP
0.14
a
0.14
ONT
0.14
408
0.14
Activations Density 0.023%