INDEX
Explanations
phrases related to key takeaways or important points
New Auto-Interp
Negative Logits
âĹĦ
-0.21
rå
-0.16
loh
-0.16
पड
-0.16
uell
-0.15
rect
-0.15
apl
-0.15
ness
-0.15
rek
-0.14
ssp
-0.14
POSITIVE LOGITS
aways
0.27
Take
0.22
Take
0.22
take
0.21
take
0.20
uchi
0.20
TAKE
0.19
hiro
0.19
.Take
0.18
_take
0.18
Activations Density 0.020%