INDEX
Explanations
proper nouns related to people's names
names of individuals
New Auto-Interp
Negative Logits
rob
-0.69
insula
-0.67
yright
-0.67
ciating
-0.65
foss
-0.65
vernment
-0.63
IBLE
-0.63
pencil
-0.62
Predator
-0.62
Circle
-0.62
POSITIVE LOGITS
ti
0.95
hak
0.90
onest
0.86
azard
0.85
merga
0.81
mad
0.80
annah
0.80
indu
0.80
yy
0.79
dar
0.78
Activations Density 0.050%