INDEX
Explanations
phrases related to representation or lack thereof
references to representation in various contexts, particularly related to social and cultural themes
New Auto-Interp
Negative Logits
cake
-0.79
launch
-0.74
urst
-0.74
awar
-0.74
strap
-0.73
imb
-0.68
stead
-0.67
sis
-0.67
sterdam
-0.65
hib
-0.65
POSITIVE LOGITS
Represent
1.03
representation
0.94
ational
0.92
ATIVE
0.90
atively
0.89
eering
0.86
ative
0.86
representations
0.85
DonaldTrump
0.80
represented
0.78
Activations Density 0.031%