INDEX
Explanations
words related to visual perception or examining situations
New Auto-Interp
Negative Logits
imity
-0.16
avra
-0.16
ddy
-0.16
upert
-0.15
DÃŃky
-0.15
lek
-0.15
ÙĤاÙħ
-0.15
ahoo
-0.14
coma
-0.14
utures
-0.14
POSITIVE LOGITS
hol
0.27
beyond
0.26
past
0.24
toward
0.20
af
0.20
favor
0.19
ways
0.19
ahead
0.19
Hol
0.19
seriously
0.19
Activations Density 0.032%