INDEX
Explanations
famous people's names
New Auto-Interp
Negative Logits
ãĤ©
-0.61
ntil
-0.56
ÑĤ
-0.56
ACTION
-0.54
-+-+
-0.52
ļé
-0.52
conflic
-0.52
%"
-0.52
¿½
-0.52
¿
-0.51
POSITIVE LOGITS
sburg
0.62
pedia
0.61
iac
0.55
Heights
0.53
shire
0.53
puff
0.53
IMAGES
0.50
Lodge
0.50
gur
0.50
dale
0.49
Activations Density 0.823%