INDEX
Explanations
proper nouns related to politics, locations, and organizations
New Auto-Interp
Negative Logits
senal
-0.63
enegger
-0.57
foundland
-0.56
vertisement
-0.54
GGGGGGGG
-0.52
looph
-0.52
retty
-0.51
Leban
-0.50
Bravo
-0.49
FontSize
-0.49
POSITIVE LOGITS
cannot
0.70
could
0.70
deems
0.70
would
0.66
decides
0.66
enters
0.66
existed
0.65
fails
0.64
had
0.64
might
0.64
Activations Density 0.864%