INDEX
Explanations
phrases and expressions that emphasize repetition or enumeration of items
New Auto-Interp
Negative Logits
å®Ŀ
-0.14
iards
-0.14
539
-0.14
dea
-0.13
beloved
-0.13
pt
-0.13
whereabouts
-0.13
necessary
-0.13
Å¥
-0.13
ovic
-0.13
POSITIVE LOGITS
thing
0.42
reason
0.36
problem
0.34
Thing
0.32
funny
0.31
Thing
0.30
interesting
0.29
thing
0.28
Problem
0.27
beauty
0.27
Activations Density 0.367%