INDEX
Explanations
proper nouns or names
variations of the word "man" in different contexts
New Auto-Interp
Negative Logits
âĸ¬
-0.63
autos
-0.61
spot
-0.59
Indian
-0.58
EStream
-0.58
indo
-0.57
sacrific
-0.57
Italy
-0.56
pse
-0.56
IMAGES
-0.55
POSITIVE LOGITS
ffe
0.86
arson
0.75
ciating
0.74
itely
0.73
aga
0.72
isse
0.72
schild
0.68
acion
0.68
emi
0.68
uve
0.67
Activations Density 0.337%