INDEX
Explanations
proper names or nouns with 'vo' in them
references to voices and vocal expressions
New Auto-Interp
Negative Logits
estinal
-0.74
Alph
-0.65
Killer
-0.65
ness
-0.61
backer
-0.60
Pub
-0.60
arity
-0.59
stadt
-0.59
forgiven
-0.58
Buffett
-0.58
POSITIVE LOGITS
apons
0.91
iture
0.89
yg
0.87
utical
0.87
milo
0.85
hua
0.81
inth
0.79
eer
0.79
illac
0.79
vo
0.77
Activations Density 0.019%