INDEX
Explanations
phrases related to making sense or rationality
New Auto-Interp
Negative Logits
_Impl
-0.18
.scalablytyped
-0.17
uma
-0.14
tps
-0.14
Sophia
-0.13
fflush
-0.13
ãĥ³ãĤ¿
-0.13
Suff
-0.13
£i
-0.13
akah
-0.13
POSITIVE LOGITS
sense
0.59
sense
0.44
Sense
0.41
sentido
0.36
Sense
0.35
sene
0.35
senses
0.34
sens
0.32
SEN
0.32
scn
0.31
Activations Density 0.030%