INDEX
Explanations
words or phrases associated with truth and reality
New Auto-Interp
Negative Logits
lean
-0.16
ReturnType
-0.15
ãĥ¼ãĤ¿ãĥ¼
-0.15
Už
-0.15
ãĥ¼ãĤ¿
-0.15
ãĥĪãĥª
-0.15
tent
-0.14
tridge
-0.14
.Logic
-0.14
rous
-0.14
POSITIVE LOGITS
à¥ĭद
0.17
fully
0.16
_escape
0.15
.Tween
0.15
power
0.15
sat
0.14
Hack
0.14
ilde
0.14
assen
0.14
ijo
0.14
Activations Density 0.139%