INDEX
Explanations
phrases indicating questions or inquiries about actions or roles
New Auto-Interp
Negative Logits
aren
-0.16
fate
-0.16
ego
-0.15
atom
-0.15
oms
-0.15
IRC
-0.15
_super
-0.14
Minds
-0.14
pta
-0.14
.bio
-0.14
POSITIVE LOGITS
opot
0.17
represent
0.15
_EMIT
0.15
abouts
0.15
stands
0.15
ramework
0.14
emma
0.14
ÙİØ£
0.14
lid
0.14
representing
0.14
Activations Density 0.077%