INDEX
Explanations
repetitive phrases indicating alternate perspectives or contrasting ideas
New Auto-Interp
Negative Logits
सन
-0.14
åĩĮ
-0.14
xe
-0.14
ÏĥÏĦα
-0.14
обÑĭ
-0.13
dra
-0.13
.properties
-0.13
çļĦäºĭ
-0.13
quo
-0.13
reesome
-0.13
POSITIVE LOGITS
flip
0.35
contrary
0.31
plus
0.29
flip
0.28
bright
0.28
surface
0.27
upside
0.26
brighter
0.26
whole
0.25
surface
0.24
Activations Density 0.039%