INDEX
Explanations
names of people and specific references to identities
New Auto-Interp
Negative Logits
iali
-0.21
iator
-0.20
iei
-0.20
anon
-0.19
iat
-0.19
iation
-0.18
ial
-0.18
ials
-0.18
iec
-0.17
ied
-0.17
POSITIVE LOGITS
insky
0.42
ink
0.42
ingle
0.41
ingu
0.41
inger
0.40
inks
0.40
insk
0.39
inct
0.39
inski
0.39
ingly
0.38
Activations Density 0.051%