INDEX
Explanations
references to specific names and characters
First names
names in sequences
New Auto-Interp
Negative Logits
Darius
-0.57
Darren
-0.54
Darren
-0.54
Dutchman
-0.53
Wilfred
-0.53
Rodney
-0.52
♂️
-0.52
Dwayne
-0.52
Eric
-0.51
carlos
-0.51
POSITIVE LOGITS
<?
0.52
Ann
0.49
rostros
0.48
Rose
0.46
Ann
0.45
unzel
0.43
Grace
0.43
Rose
0.43
oídos
0.43
Grace
0.42
Activations Density 0.763%