INDEX
Explanations
proper nouns related to individuals, specifically names such as Sharon, Leah, and Denise
references to specific names, particularly "Sharon" and other individuals
New Auto-Interp
Negative Logits
exempt
-0.85
notation
-0.81
rity
-0.75
effic
-0.75
ĻĤ
-0.72
credit
-0.71
safety
-0.70
nesota
-0.70
juven
-0.69
undai
-0.69
POSITIVE LOGITS
Sharon
0.96
Levy
0.92
Blossom
0.84
Yard
0.81
Tate
0.80
Chev
0.78
ville
0.78
Angle
0.76
Katz
0.76
Richards
0.75
Activations Density 0.007%