INDEX
Explanations
names or aliases of people, such as stage names or given names
mentions of names
New Auto-Interp
Negative Logits
yrinth
-0.84
romy
-0.79
IU
-0.67
EMS
-0.66
ownt
-0.65
irth
-0.65
psey
-0.64
saf
-0.64
receptive
-0.61
tumblr
-0.61
POSITIVE LOGITS
plates
1.10
paces
1.00
plate
0.98
names
0.97
name
0.96
NAME
0.96
aliases
0.89
names
0.89
ames
0.84
name
0.83
Activations Density 0.034%