INDEX
Explanations
the name "Robert" in various contexts and forms
New Auto-Interp
Negative Logits
whoſe
-1.07
itſelf
-0.98
ſhould
-0.94
himſelf
-0.87
cauſe
-0.85
whofe
-0.85
Theſe
-0.84
osť
-0.84
againſt
-0.82
becauſe
-0.82
POSITIVE LOGITS
Robert
1.38
Robert
1.26
Roberts
1.17
ROBERT
1.16
ROBERT
1.11
robert
1.09
robert
1.09
Roberts
1.05
Rober
1.00
Bob
0.98
Activations Density 0.009%