INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
-vers
-0.31
cop
-0.30
zem
-0.29
oud
-0.27
Vers
-0.27
æĪIJ份
-0.26
æĪIJåĪĨ
-0.26
maxi
-0.26
ophobia
-0.25
Maver
-0.25
POSITIVE LOGITS
lib
0.28
mistake
0.26
Lib
0.26
self
0.25
bowl
0.24
lib
0.24
self
0.24
barred
0.24
tip
0.24
binations
0.24
Activations Density 0.012%
No Known Activations
This feature has no known activations.