INDEX
Explanations
proper nouns, specifically names of individuals
references to specific individuals, particularly those named Braz and Deborah
New Auto-Interp
Negative Logits
arium
-0.91
Sabha
-0.85
nery
-0.82
urtles
-0.79
oldemort
-0.77
apore
-0.77
Parent
-0.74
elia
-0.73
enza
-0.73
ierra
-0.72
POSITIVE LOGITS
lain
0.87
Braz
0.81
ank
0.74
tons
0.71
hyde
0.70
maid
0.68
microphones
0.68
levard
0.64
medi
0.63
horizont
0.63
Activations Density 0.018%