INDEX
Explanations
phrases indicating potential actions or capabilities
New Auto-Interp
Negative Logits
ä¹ĥ
-0.17
ospace
-0.16
/fixtures
-0.15
eti
-0.15
à¸
-0.15
gii
-0.14
Serialized
-0.14
odox
-0.14
detect
-0.14
icom
-0.14
POSITIVE LOGITS
seen
0.28
found
0.28
found
0.27
seen
0.25
-found
0.24
FOUND
0.24
viewed
0.24
Found
0.23
Seen
0.22
Found
0.22
Activations Density 0.030%