INDEX
Explanations
mentions of specific locations or contexts in stories
New Auto-Interp
Negative Logits
has
-0.50
is
-0.49
”
-0.47
as
-0.45
“
-0.45
for
-0.43
it
-0.43
{-0.42
cons
-0.42
}';
-0.41
POSITIVE LOGITS
lenker
1.04
houſe
1.01
ſmall
0.91
purpoſe
0.90
tartalomajánló
0.89
poffe
0.89
ſtate
0.88
Jefus
0.87
\{\\0.87
NSCoder
0.86
Activations Density 0.312%