INDEX
Explanations
names, especially the name "Joseph"
New Auto-Interp
Negative Logits
APD
-0.72
GOODMAN
-0.71
"$:/
-0.70
hips
-0.69
atron
-0.68
ricted
-0.65
idad
-0.63
raints
-0.63
warr
-0.62
ADRA
-0.60
POSITIVE LOGITS
smanship
1.04
uth
0.80
hawks
0.77
fires
0.77
fters
0.76
bard
0.72
tein
0.71
fruit
0.71
fire
0.70
ry
0.70
Activations Density 1.549%