INDEX
Explanations
phrases and words related to knowledge and awareness
New Auto-Interp
Negative Logits
OLLOW
-0.16
chyb
-0.16
.scalablytyped
-0.15
umi
-0.15
gnu
-0.15
_UNUSED
-0.15
regon
-0.14
jure
-0.14
ungen
-0.14
ä¸įè¦ģ
-0.14
POSITIVE LOGITS
zero
0.48
absolutely
0.42
little
0.42
ZERO
0.39
zero
0.39
Zero
0.39
little
0.37
-zero
0.37
Zero
0.36
_zero
0.34
Activations Density 0.198%