INDEX
Explanations
phrases emphasizing observation or perception
New Auto-Interp
Negative Logits
ôi
-0.14
legates
-0.14
ons
-0.14
atab
-0.14
ó
-0.14
ughter
-0.14
tics
-0.14
erton
-0.14
bra
-0.14
gram
-0.13
POSITIVE LOGITS
ulong
0.16
etty
0.15
رÙĪØª
0.15
otas
0.15
ehr
0.15
tridge
0.14
unb
0.14
ENCE
0.14
âĶĺ
0.14
_deinit
0.14
Activations Density 0.019%