INDEX
Explanations
mentions of the name "William" in various contexts
New Auto-Interp
Negative Logits
ContentAlignment
-0.80
Motive
-0.79
Daryl
-0.77
Daryl
-0.77
kozó
-0.75
estimés
-0.74
McCartney
-0.73
Quod
-0.71
INIT
-0.71
Rolf
-0.70
POSITIVE LOGITS
William
1.32
William
1.23
Williams
1.14
WILLIAM
1.08
Williamson
1.05
william
1.03
WILLIAM
1.02
william
0.90
williams
0.85
Willi
0.84
Activations Density 0.090%