INDEX
Explanations
references to a specific family name
references to a specific family name with various contexts
New Auto-Interp
Negative Logits
ANC
-0.75
ANCE
-0.71
scape
-0.70
ANA
-0.70
REL
-0.69
RW
-0.68
zai
-0.67
âĸ¬
-0.66
BIT
-0.66
gan
-0.63
POSITIVE LOGITS
hift
0.97
umps
0.93
ules
0.79
manship
0.78
hooting
0.77
paces
0.75
peed
0.75
uge
0.74
etsk
0.74
poons
0.74
Activations Density 0.011%