INDEX
Explanations
phrases indicating authorship or attribution
New Auto-Interp
Negative Logits
osterone
-0.70
ãħĭ
-0.70
itled
-0.69
buster
-0.68
ivas
-0.67
bard
-0.65
thal
-0.64
heimer
-0.64
riger
-0.64
ppa
-0.64
POSITIVE LOGITS
virtue
1.18
products
1.02
multiplying
0.91
means
0.80
passers
0.80
leaps
0.76
subtract
0.76
invoking
0.76
accident
0.76
combining
0.75
Activations Density 0.086%